On Scalability for MPI Runtime Systems

The future of high performance computing, as being currently foretold, will gravitate toward hundreds of thousands to million node machines, harnessing the computing power of billions of cores. While the hardware part is well covered, the software infrastructure at that scale is vague. However, no matter what the infrastructure will be, efficiently running parallel applications on such large machines will require optimized runtime environments that are scalable and resilient. More particularly, considering a future where Message Passing Interface (MPI) remains a major programming paradigm, the MPI implementations will have to seamlessly adapt to launching and managing large scale applications on resources several levels of magnitude larger than today. In this paper, we present a modified version of the Open MPI runtime that has been adapted towards a scalability goal. We evaluate the performance and compare it with two widely used runtime systems: the default version of Open MPI and MPICH2; using various underlying launching systems. The performance evaluation demonstrates a significant improvement over the state of the art. We also discuss the basic requirements for an exascale-ready parallel runtime.

[1]  Dhabaleswar K. Panda,et al.  ScELA: scalable and extensible launching architecture for clusters , 2008, HiPC'08.

[2]  Rajeev Thakur,et al.  PMI: A Scalable Parallel Process-Management Interface for Extreme-Scale Systems , 2010, EuroMPI.

[3]  Barton P. Miller,et al.  Tree-based overlay networks for scalable applications , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[4]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[5]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[6]  William Gropp,et al.  MPI on BlueGene/L: Designing an Efficient General Purpose Messaging Solution for a Large Cellular System , 2003, PVM/MPI.

[7]  Brian W. Barrett,et al.  The Open Run-Time Environment (OpenRTE): A Transparent Multi-cluster Environment for High-Performance Computing , 2005, PVM/MPI.

[8]  Brian W. Barrett,et al.  The Open Run-Time Environment (OpenRTE): A transparent multicluster environment for high-performance computing , 2008, Future Gener. Comput. Syst..

[9]  Franck Cappello,et al.  Grid'5000: a large scale and highly reconfigurable grid experimental testbed , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[10]  Olivier Richard,et al.  TakTuk, adaptive deployment of remote executions , 2009, HPDC '09.

[11]  William Gropp,et al.  A Scalable Process-Management Environment for Parallel Programs , 2000, PVM/MPI.