Scaling physics and material science applications on a massively parallel Blue Gene/L system

Blue Gene/L represents a new way to build supercomputers, using a large number of low power processors, together with multiple integrated interconnection networks. Whether real applications can scale to tens of thousands of processors (on a machine like Blue Gene/L) has been an open question. In this paper, we describe early experience with several physics and material science applications on a 32,768 node Blue Gene/L system, which was installed recently at the Lawrence Livermore National Laboratory. Our study shows some problems in the applications and in the current software implementation, but overall, excellent scaling of these applications to 32K nodes on the current Blue Gene/L system. While there is clearly room for improvement, these results represent the first proof point that MPI applications can effectively scale to over ten thousand processors. They also validate the scalability of the hardware and software architecture of Blue Gene/L.

[1]  John A. Gunnels,et al.  A high-performance SIMD floating point unit for BlueGene/L: architecture, compilation, and algorithm design , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[2]  William Gropp,et al.  Design and implementation of message-passing services for the Blue Gene/L supercomputer , 2005, IBM J. Res. Dev..

[3]  Peng Wu,et al.  Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.

[4]  José E. Moreira,et al.  Unlocking the Performance of the BlueGene/L Supercomputer , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[5]  Peng Wu,et al.  Efficient SIMD code generation for runtime alignment and length conversion , 2005, International Symposium on Code Generation and Optimization.

[6]  Michael Lang,et al.  A Performance and Scalability Analysis of the BlueGene/L Architecture , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[7]  F. Petrini,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[8]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[9]  Hans P. Zima,et al.  The Earth Simulator , 2004, Parallel Comput..

[10]  José E. Moreira,et al.  An Overview Of The Bluegene/L System Software Organization , 2003, Parallel Process. Lett..

[11]  John A. Gunnels,et al.  The Design and Implementation of Message Passing Services for the BlueGene / L Supercomputer , 2004 .