Blue Gene/L performance tools

Good performance monitoring is the basis of modern performance analysis tools for application optimization. We are providing a variety of such performance analysis tools for the new Blue Gene®/L supercomputer. Those tools can be divided into two categories: single-node performance tools and multinode performance tools. From a single-node perspective, we provide standard interfaces and libraries, such as PAPI and libHPM, that provide access to the hardware performance counters for applications running on the Blue Gene/L compute nodes. From a multinode perspective, we focus on tools that analyze Message Passing Interface (MPI) behavior. Those tools work by first collecting message-passing trace data when a program runs. The trace data is then used by graphical interface tools that analyze the behavior of applications. Using the current prototype tools, we demonstrate their usefulness and applicability with case studies of application optimization.

[1]  B. Miller,et al.  The Paradyn Parallel Performance Measurement Tools , 1995 .

[2]  Burkhard D. Steinmacher-Burow,et al.  Cellular supercomputing with system-on-a-chip , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[3]  Luiz De Rose The Hardware Performance Monitor Toolkit , 2001, Euro-Par.

[4]  José E. Moreira,et al.  Obtaining Hardware Performance Metrics for the BlueGene/L Supercomputer , 2003, Euro-Par.

[5]  Luis Ceze,et al.  Full Circle: Simulating Linux Clusters on Linux Clusters , 2003 .

[6]  Michael Lang,et al.  A Performance and Scalability Analysis of the BlueGene/L Architecture , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[7]  William Gropp,et al.  Implementing MPI on the BlueGene/L Supercomputer , 2004, Euro-Par.

[8]  José E. Moreira,et al.  An Overview Of The Bluegene/L System Software Organization , 2003, Parallel Process. Lett..

[9]  A. Chien,et al.  High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[10]  Cos S. Ierotheou,et al.  Computer Aided Parallelisation Tools (CAPTools) - Conceptual Overview and Performance on the Parallelisation of Structured Mesh Codes , 1996, Parallel Comput..

[11]  Min Zhou,et al.  Experiences and lessons learned with a portable interface to hardware performance counters , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[12]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[13]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[14]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[15]  José E. Moreira,et al.  The BlueGene/L pseudo cycle-accurate simulator , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[16]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[17]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[18]  Philip Heidelberger,et al.  Cellular supercomputing with system-on-a-chip , 2002 .

[19]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[20]  Giovanni Chiola,et al.  Efficient parallel processing on low-cost clusters with GAMMA active ports , 2000, Parallel Comput..