Performance Evaluation of Open MPI on Cray XE/XK Systems

Open MPI is a widely used open-source implementation of the MPI-2 standard that supports a variety of platforms and interconnects. Current versions of Open MPI, however, lack support for the Cray XE6 and XK6 architectures -- both of which use the Gemini System Interconnect. In this paper, we present extensions to natively support these architectures within Open MPI, describe and propose solutions for performance and scalability bottlenecks, and provide an extensive evaluation of our implementation, which is the first completely open-source MPI implementation for the Cray XE/XK system families used at 49,152 processes. Application and micro-benchmark results show that the performance and scaling characteristics of our implementation are similar to the vendor-supplied MPI's. Micro-benchmark results show short-data 1-byte and 1,024-byte message latencies of 1.20 μs and 4.13 μs, which are 10.00% and 39.71% better than the vendor-supplied MPI's, respectively. Our implementation achieves a bandwidth of 5.32 GB/s at 8 MB, which is similar to the vendor-supplied MPI's bandwidth at the same message size. Two Sequoia benchmark applications, LAMMPS and AMG2006, were also chosen to evaluate our implementation at scales up to 49,152 cores -- where we exhibited similar performance and scaling characteristics when compared to the vendor-supplied MPI implementation. LAMMPS achieved a parallel efficiency of 88.20% at 49,152 cores using Open MPI, which is on par with the vendor-supplied MPI's achieved parallel efficiency.