A study of MPI performance analysis tools on Blue Gene/L

Applications on today's massively parallel supercomputers rely on performance analysis tools to guide them toward scalable performance on thousands of processors. However, conventional tools for parallel performance analysis have serious problems due to the large data volume that may be required. In this paper, we discuss the scalability issue for MPI performance analysis on Blue Gene/L, the world's fastest supercomputing platform. We present an experimental study of existing MPI performance tools that were ported to BG/L from other platforms. These tools can be classified into two categories: profiling tools that collect timing summaries, and tracing tools that collect a sequence of time-stamped events. Profiling tools produce small data volumes and can scale well, but tracing tools tend to scale poorly. The experimental study discusses the advantages and disadvantages for the tools in the two categories and will be helpful in the future performance tools design.

[1]  G. Sod A survey of several finite difference methods for systems of nonlinear hyperbolic conservation laws , 1978 .

[2]  J. Dukowicz,et al.  Implicit free‐surface method for the Bryan‐Cox‐Semtner ocean model , 1994 .

[3]  William Gropp,et al.  MPI on BlueGene/L: Designing an Efficient General Purpose Messaging Solution for a Large Cellular System , 2003, PVM/MPI.

[4]  Bronis R. de Supinski,et al.  Scaling physics and material science applications on a massively parallel Blue Gene/L system , 2005, ICS '05.

[5]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[6]  William Gropp,et al.  Design and implementation of message-passing services for the Blue Gene/L supercomputer , 2005, IBM J. Res. Dev..

[7]  Bernd Mohr,et al.  KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Programs , 2003, Euro-Par.

[8]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[9]  George L.-T. Chiu,et al.  Blue Gene/L, a system-on-a-chip , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[10]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[11]  Philip Heidelberger,et al.  Blue Gene/L torus interconnection network , 2005, IBM J. Res. Dev..

[12]  Felix Wolf,et al.  KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications , 2003 .

[13]  Bernd Mohr,et al.  Initial design of a test suite for automatic performance analysis tools , 2003, Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2003. Proceedings..

[14]  R. C. Malone,et al.  A Reformulation and Implementation of the Bryan-Cox-Semtner Ocean Model on the Connection Machine , 1993 .

[15]  R. C. Malone,et al.  Parallel ocean general circulation modeling , 1992 .

[16]  Philip Heidelberger,et al.  Cellular supercomputing with system-on-a-chip , 2002 .

[17]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.