ScELA: scalable and extensible launching architecture for clusters

As cluster sizes head into tens of thousands, current joblaunchmechanisms do not scale as they are limited by resource constraintsas well as performance bottlenecks. The job launch process includes twophases - spawning of processes on processors and information exchange betweenprocesses for job initialization. Implementations of various programmingmodels follow distinct protocols for the information exchange phase.We present the design of a scalable, extensible and high-performance joblaunch architecture for very large scale parallel computing. We present implementationsof this architecture which achieve a speedup of more than700% in launching a simple Hello World MPI application on 10, 240 processorcores and also scale to more than 3 times the number of processorcores compared to prior solutions.

[1]  William Gropp,et al.  Components and interfaces of a process management system for parallel programs , 2001, Parallel Comput..

[2]  D.K. Panda,et al.  Scalable NIC-based Reduction on Large-scale Clusters , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[3]  Katherine Yelick,et al.  Introduction to UPC and Language Specification , 2000 .

[4]  R. Brightwell,et al.  Scalable Parallel Application Launch on Cplant ™ , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[5]  M. Jette,et al.  Simple Linux Utility for Resource Management , 2009 .

[6]  A. Shukla,et al.  TCP Connection Management Mechanisms for Improving Internet Server Performance , 2006, 2006 1st IEEE Workshop on Hot Topics in Web Systems and Technologies.

[7]  Wei Huang,et al.  Design of High Performance MVAPICH2: MPI2 over InfiniBand , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).