Parallel performance wizard - framework and techniques for parallel application optimization

Developing a high-performance parallel application is difficult. Given the complexity of high-performance parallel programs, developers often must rely on performance analysis tools to help them improve the performance of their applications. While many tools support analysis of message-passing programs, tool support is limited for applications written in other programming models such as those in the partitioned global-address-space (PGAS) family, which is of growing importance. Existing tools that support message-passing models are difficult to extend to support other parallel models because of the differences between the paradigms. In this dissertation, we present work on the Parallel Performance Wizard (PPW) system, the first general-purpose performance system for parallel application optimization. The complete research is divided into three parts. First, we introduce a model-independent PPW performance tool framework for parallel application analysis. Next, we present a new scalable, model-independent PPW analysis system designed to automatically detect, diagnose, and possibly resolve bottlenecks within a parallel application. Finally, we discuss case studies to evaluate the effectiveness of PPW and conclude with contributions and future directions for the PPW project.

[1]  Alan D. George,et al.  Multiparadigm Computing for Space-Based Synthetic Aperture Radar , 2008, ERSA.

[2]  Robert J. Fowler,et al.  HPCVIEW: A Tool for Top-down Analysis of Node Performance , 2002, The Journal of Supercomputing.

[3]  Michael T. Heath,et al.  Visualizing the performance of parallel programs , 1991, IEEE Software.

[4]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[5]  B. Miller,et al.  The Paradyn Parallel Performance Measurement Tools , 1995 .

[6]  Jeffrey S. Vetter,et al.  Statistical scalability analysis of communication operations in distributed applications , 2001, PPoPP '01.

[7]  Bernd Mohr,et al.  KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Programs , 2003, Euro-Par.

[8]  Li Li,et al.  Model-Based Performance Diagnosis of Master-Worker Parallel Computations , 2006, Euro-Par.

[9]  Bernd Mohr,et al.  A test suite for parallel performance analysis tools , 2007, Concurr. Comput. Pract. Exp..

[10]  Guojing Cong,et al.  A framework for automated performance bottleneck detection , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[11]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[12]  Laxmikant V. Kalé,et al.  NOISEMINER: An algorithm for scalable automatic computational noise and software interference detection , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[13]  Allen D. Malony,et al.  The role of instrumentation and mapping in performance measurement , 2001 .

[14]  Tomàs Margalef,et al.  Search of Performance Inefficiencies in Message Passing Applications with KappaPI 2 Tool , 2006, PARA.

[15]  Bernd Mohr,et al.  Scalable Parallel Trace-Based Performance Analysis , 2006, PVM/MPI.

[16]  Nathan Froyd,et al.  Scalability analysis of SPMD codes using expectations , 2007, ICS '07.

[17]  Bernd Mohr,et al.  Principles and practice of experimental performance measurement and analysis of parallel applications , 2006, SC.

[18]  Jeffrey S. Vetter,et al.  Asserting Performance Expectations , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[19]  Ying Zhang,et al.  SvPablo: A Multi-language Performance Analysis System , 1998, Computer Performance Evaluation.

[20]  Jesús Labarta,et al.  DiP: A Parallel Program Development Environment , 1996, Euro-Par, Vol. II.

[21]  Michael Gerndt,et al.  Automated Performance Analysis Using ASL Performance Properties , 2006, PARA.

[22]  Richard Luczak,et al.  The PAPI Cross-Platform Interface to Hardware Performance Counters , 2001 .

[23]  Jeffrey K. Hollingsworth,et al.  Finding bottlenecks in large scale parallel programs , 1995, Technical Report / University of Wisconsin, Madison / Computer Sciences Department.

[24]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[25]  Adam Leko,et al.  Practical Experiences with Modern Parallel Performance Analysis Tools : An Evaluation , 2005 .

[26]  Allen D. Malony,et al.  PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).