A Locality-Aware Communication Layer for Virtualized Clusters

Locality-aware HPC communication stacks have been around with the emergence of SMP systems since the early 2000s. Common MPI implementations provide communication paths optimized for the underlying transport mechanism, i.e., two processes residing on the same SMP node should leverage local shared-memory communication while inter-node communication should be realized by means of HPC interconnects. As virtualization gains more and more importance in the area of HPC, locality-awareness becomes relevant again. Commonly, HPC systems lack support for efficient communication among co-located VMs, i.e., they harness the local InfiniBand adapter as opposed to the shared physical memory on the host system. This results in important performance penalties, especially for communication intensive applications. With IVShmem there exist means for the exploitation of the local memory as communication medium. In this paper we present a locality-aware MPI layer leveraging this technology for efficient intra-host inter-VM communication. We evaluate our implementation by drawing a comparison to a non-locality-aware communication layer in virtualized clusters.

[1]  Fabienne Anhalt,et al.  Linux-based virtualization for HPC clusters , 2009 .

[2]  Amith R. Mamidala,et al.  Efficient SMP-aware MPI-level broadcast over InfiniBand's hardware multicast , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[3]  André Brinkmann,et al.  Migration Techniques in HPC Environments , 2014, Euro-Par Workshops.

[4]  Antonello Monti,et al.  Non-intrusive Migration of MPI Processes in OS-Bypass Networks , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[5]  Geoffrey C. Fox,et al.  Analysis of Virtualization Technologies for High Performance Computing Environments , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[6]  Stefan Lankes,et al.  Application migration in HPC — A driver of the exascale era? , 2016, 2016 International Conference on High Performance Computing & Simulation (HPCS).

[7]  Antonello Monti,et al.  Virtualization in HPC - An Enabler for Adaptive Co-Scheduling? , 2016, COSH@HiPEAC.

[8]  Dhabaleswar K. Panda,et al.  High performance MPI library over SR-IOV enabled infiniband clusters , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[9]  Gil Neiger,et al.  Intel ® Virtualization Technology for Directed I/O , 2006 .

[10]  Thomas Moschny,et al.  Dynamic Process Management with Allocation-internal Co-Scheduling towards Interactive Supercomputing , 2016 .

[11]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[12]  Dhabaleswar K. Panda,et al.  Slurm-V: Extending Slurm for Building Efficient HPC Cloud with SR-IOV and IVShmem , 2016, Euro-Par.

[13]  Paul Lu,et al.  Shared-memory optimizations for virtual machines , 2011 .

[14]  Stefan Lankes,et al.  Implications of Process-Migration in Virtualized Environments , 2016, HiPEAC 2016.

[15]  Gil Neiger,et al.  Intel virtualization technology , 2005, Computer.

[16]  Jesper Larsson Träff,et al.  Improved MPI All-to-all Communication on a Giganet SMP Cluster , 2002, PVM/MPI.

[17]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[18]  J. L. Traff SMP-aware message passing programming , 2003 .