Virtual machine aware communication libraries for high performance computing

As the size and complexity of modern computing systems keep increasing to meet the demanding requirements of High Performance Computing (HPC) applications, manageability is becoming a critical concern to achieve both high performance and high productivity computing. Meanwhile, virtual machine (VM) technologies have become popular in both industry and academia due to various features designed to ease system management and administration. While a VM-based environment can greatly help manageability on large-scale computing systems, concerns over performance have largely blocked the HPC community from embracing VM technologies. In this paper, we follow three steps to demonstrate the ability to achieve near-native performance in a VM-based environment for HPC. First, we propose Inter-VM Communication (IVC), a VM-aware communication library to support efficient shared memory communication among computing processes on the same physical host, even though they may be in different VMs. This is critical for multi-core systems, especially when individual computing processes are hosted on different VMs to achieve fine-grained control. Second, we design a VM-aware MPI library based on MVAPICH2 (a popular MPI library), called MVAPICH2-ivc, which allows HPC MPI applications to transparently benefit from IVC. Finally, we evaluate MVAPICH2-ivc on clusters featuring multi-core systems and high performance InfiniBand interconnects. Our evaluation demonstrates that MVAPICH2-ivc can improve NAS Parallel Benchmark performance by up to 11% in VM-based environment on eight-core Intel Clover-town systems, where each compute process is in a separate VM. A detailed performance evaluation for up to 128 processes (64 node dual-socket single-core systems) shows only a marginal performance overhead of MVAPICH2-ivc as compared with MVAPICH2 running in a native environment. This study indicates that performance should no longer be a barrier preventing HPC environments from taking advantage of the various features available through VM technologies.

[1]  Robert D. Falgout,et al.  Semicoarsening Multigrid on Distributed Memory Machines , 1999, SIAM J. Sci. Comput..

[2]  S. Pakin,et al.  VMI 2 . 0 : A Dynamically Reconfigurable Messaging Layer for Availability , Usability , and Management , 2002 .

[3]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[4]  HarrisTim,et al.  Xen and the art of virtualization , 2003 .

[5]  Renato J. O. Figueiredo,et al.  A case for grid computing on virtual machines , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[6]  Ian T. Foster,et al.  MPICH-G2: A Grid-enabled implementation of the Message Passing Interface , 2002, J. Parallel Distributed Comput..

[7]  Andrew Warfield,et al.  Reconstructing I/O , 2004 .

[8]  Ronald Minnich,et al.  A Network-Failure-Tolerant Message-Passing System for Terascale Clusters , 2002, ICS '02.

[9]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[10]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[11]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[12]  Shigeru Chiba,et al.  HyperSpector: virtual distributed monitoring environments for secure intrusion detection , 2005, VEE '05.

[13]  Justin Cappos,et al.  Proper: Privileged Operations in a Virtualised System Environment , 2005, USENIX Annual Technical Conference, General Track.

[14]  Orran Krieger,et al.  Virtualization for high-performance computing , 2006, OPSR.

[15]  Dhabaleswar K. Panda,et al.  High Performance VMM-Bypass I/O in Virtual Machines , 2006, USENIX Annual Technical Conference, General Track.

[16]  Abhinav Vishnu,et al.  A Software Based Approach for Providing Network Fault Tolerance in Clusters with uDAPL interface: MPI Level Design and Performance Evaluation , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[17]  Arthur B. Maccabe FAST-OS: forum to address scalable technology for runtime and operating systems , 2006, SC.

[18]  Guillaume Mercier,et al.  Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[19]  Dhabaleswar K. Panda,et al.  Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters , 2006, 2006 IEEE International Conference on Cluster Computing.

[20]  David E. Bernholdt,et al.  MOLAR: adaptive runtime support for high-end computing operating and runtime systems , 2006, OPSR.

[21]  Dhabaleswar K. Panda,et al.  A case for high performance computing with virtual machines , 2006, ICS '06.

[22]  Dilma Da Silva,et al.  K42: building a complete operating system , 2006, EuroSys.

[23]  Dhabaleswar K. Panda,et al.  Nomad: migrating OS-bypass networks in virtual machines , 2007, VEE '07.

[24]  Alan L. Cox,et al.  Concurrent Direct Network Access for Virtual Machine Monitors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[25]  Christian Engelmann,et al.  Proactive fault tolerance for HPC with Xen virtualization , 2007, ICS '07.

[26]  F. Mueller Final Scientific / Technical Report : MOLAR : Modular Linux and Adaptive Runtime Support for High-end Computing Operating and Runtime Systems , 2009 .