Virtual InfiniBand clusters for HPC clouds

High Performance Computing (HPC) employs fast interconnect technologies to provide low communication and synchronization latencies for tightly coupled parallel compute jobs. Contemporary HPC clusters have a fixed capacity and static runtime environments; they cannot elastically adapt to dynamic workloads, and provide a limited selection of applications, libraries, and system software. In contrast, a cloud model for HPC clusters promises more flexibility, as it provides elastic virtual clusters to be available on-demand. This is not possible with physically owned clusters. In this paper, we present an approach that makes it possible to use InfiniBand clusters for HPC cloud computing. We propose a performance-driven design of an HPC IaaS layer for InfiniBand, which provides throughput and latency-aware virtualization of nodes, networks, and network topologies, as well as an approach to an HPC-aware, multi-tenant cloud management system for elastic virtualized HPC compute clusters.

[1]  Orran Krieger,et al.  Virtualization for high-performance computing , 2006, OPSR.

[2]  Constantinos Evangelinos,et al.  Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere- , 2008 .

[3]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[4]  Andrea C. Arpaci-Dusseau,et al.  Effective distributed scheduling of parallel workloads , 1996, SIGMETRICS '96.

[5]  James E. Smith,et al.  The architecture of virtual machines , 2005, Computer.

[6]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[7]  Andrzej Goscinski,et al.  IaaS clouds vs. clusters for HPC: a performance study , 2011, CLOUD 2011.

[8]  Dhabaleswar K. Panda,et al.  A case for high performance computing with virtual machines , 2006, ICS '06.

[9]  Renato J. O. Figueiredo,et al.  A case for grid computing on virtual machines , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[10]  Dhabaleswar K. Panda,et al.  High Performance VMM-Bypass I/O in Virtual Machines , 2006, USENIX Annual Technical Conference, General Track.

[11]  Katarzyna Keahey,et al.  Contextualization: Providing One-Click Virtual Clusters , 2008, 2008 IEEE Fourth International Conference on eScience.

[12]  Hans Werner Meuer The TOP500 Project. Looking Back over 15 Years of Supercomputing Experience , 2008, PIK Prax. Informationsverarbeitung Kommun..

[13]  Terry Jones,et al.  Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[14]  Preston M. Smith,et al.  Cost-Effective HPC: The Community or the Cloud? , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[15]  Ron Brightwell,et al.  Characterizing application sensitivity to OS interference using kernel-level noise injection , 2008, HiPC 2008.

[16]  Scott Pakin,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8, 192 Processors of ASCI Q , 2003, SC.

[17]  Nathan Regola,et al.  Recommendations for Virtualization Technologies in High Performance Computing , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[18]  Muli Ben-Yehuda,et al.  Direct Device Assignment for Untrusted Fully-Virtualized Virtual Machines , 2008 .

[19]  Abhishek Gupta,et al.  Evaluation of HPC Applications on Cloud , 2011, 2011 Sixth Open Cirrus Summit.

[20]  José Duato,et al.  A Formal Model to Manage the InfiniBand Arbitration Tables Providing QoS , 2007, IEEE Transactions on Computers.

[21]  Charles Shubert,et al.  StarHPC — Teaching parallel programming within elastic compute cloud , 2009, Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces.

[22]  Borja Sotomayor,et al.  Virtual Clusters for Grid Communities , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[23]  Ralf H. Reussner,et al.  SKaMPI: A Detailed, Accurate MPI Benchmark , 1998, PVM/MPI.