Zeroing memory deallocator to reduce checkpoint sizes in virtualized HPC environments

Virtualization has become an indispensable tool in data centers and cloud environments to flexibly assign virtual machines (VMs) to resources. Virtualization also becomes more and more attractive for high-performance computing (HPC). This is mainly due to the strong isolation of VMs which enables: (1) the sharing of cluster nodes and optimization of the system’s overall utilization; (2) load balancing by means of migrations due to the reduction of residual dependencies; and (3) the creation of system-level checkpoints increasing the fault tolerance in an application-transparent way. On the downside, the additional virtualization layer conceals information that is only available on the process level. This information has a direct influence on the checkpoint size which should be kept as small as possible. In this paper, we propose a novel technique for checkpoint size reduction in virtualized environments. We exploit the fact that the hypervisor detects zero pages which are omitted when capturing a checkpoint. Moreover, compression techniques are applied for a further reduction of the checkpoint size. We therefore fill freed memory regions with zeros supporting both the zero-page detection and the compression. We evaluate our approach by taking the example of HPC applications. The results reveal a reduction of the checkpoint size by up to 9% when compression is disabled in the hypervisor and up to 49% with compression enabled. Furthermore, memory zeroing is able to reduce VM migration time by up to 10% when compression is disabled and by up to 60% when compression is enabled.

[1]  Gene Cooperman,et al.  DMTCP: Transparent checkpointing for cluster computations and the desktop , 2007, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[2]  Stefan Lankes,et al.  HermitCore: A Unikernel for Extreme Scale Computing , 2016, ROSS@HPDC.

[3]  Steven J. Plimpton,et al.  Software components for parallel multiscale simulation: an example with LAMMPS , 2010, Engineering with Computers.

[4]  André Brinkmann,et al.  Accelerating Application Migration in HPC , 2016, ISC Workshops.

[5]  Mahadev Satyanarayanan,et al.  Internet suspend/resume , 2002, Proceedings Fourth IEEE Workshop on Mobile Computing Systems and Applications.

[6]  Josef Weidendorfer,et al.  Case Study on Co-scheduling for HPC Applications , 2015, 2015 44th International Conference on Parallel Processing Workshops.

[7]  Bronis R. de Supinski,et al.  McrEngine: a scalable checkpointing system using data-aware aggregation and compression , 2012, HiPC 2012.

[8]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[9]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[10]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[11]  J. Duell The design and implementation of Berkeley Lab's linux checkpoint/restart , 2005 .

[12]  Jean-Baptiste Colliat,et al.  Multiscale in time and stability analysis of operator split solution procedures applied to thermomechanical problems , 2009 .

[13]  Gil Neiger,et al.  Intel virtualization technology , 2005, Computer.

[14]  Antonello Monti,et al.  Prospects and challenges of virtual machine migration in HPC , 2018, Concurr. Comput. Pract. Exp..

[15]  Chandra Krintz,et al.  Evaluating the Performance Impact of Xen on MPI and Process Execution For HPC Systems , 2006, First International Workshop on Virtualization Technology in Distributed Computing (VTDC 2006).

[16]  Peter K. Szwed,et al.  Application-level checkpointing for shared memory programs , 2004, ASPLOS XI.

[17]  Peter M. Kasson,et al.  GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit , 2013, Bioinform..

[18]  André Brinkmann,et al.  Deduplication Potential of HPC Applications’ Checkpoints , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[19]  Nicolas Lartillot,et al.  PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating , 2009, Bioinform..

[20]  Axel Keller,et al.  Virtualized HPC: a contradiction in terms? , 2012, Softw. Pract. Exp..

[21]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[22]  John Paul Walters,et al.  A Comparison of Virtualization Technologies for HPC , 2008, 22nd International Conference on Advanced Information Networking and Applications (aina 2008).

[23]  Mahadev Satyanarayanan,et al.  Pervasive Personal Computing in an Internet Suspend/Resume System , 2007, IEEE Internet Computing.

[24]  Kenneth C. Knowlton,et al.  A fast storage allocator , 1965, CACM.

[25]  Olivier Gascuel,et al.  Empirical profile mixture models for phylogenetic reconstruction , 2008, Bioinform..

[26]  Tao Ke,et al.  Checkpointing Orchestration: Toward a Scalable HPC Fault-Tolerant Environment , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[27]  Daniel Marques,et al.  Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[28]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[29]  André Brinkmann,et al.  Migration Techniques in HPC Environments , 2014, Euro-Par Workshops.

[30]  Kurt B. Ferreira,et al.  On the Viability of Checkpoint Compression for Extreme Scale Fault Tolerance , 2011, Euro-Par Workshops.

[31]  John A. Gunnels,et al.  Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[32]  Satoshi Sekiguchi,et al.  Reactive consolidation of virtual machines enabled by postcopy live migration , 2011, VTDC '11.

[33]  Jianhua Gu,et al.  A Scheduling Strategy on Load Balancing of Virtual Machine Resources in Cloud Computing Environment , 2010, 2010 3rd International Symposium on Parallel Architectures, Algorithms and Programming.

[34]  Petter Svärd,et al.  High Performance Live Migration through Dynamic Page Transfer Reordering and Compression , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[35]  André Seznec,et al.  Decoupled zero-compressed memory , 2011, HiPEAC.

[36]  André Brinkmann,et al.  Impact of the Scheduling Strategy in Heterogeneous Systems That Provide Co-Scheduling , 2016, COSH@HiPEAC.

[37]  Kai Li,et al.  Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.

[38]  Christian Engelmann,et al.  Proactive fault tolerance for HPC with Xen virtualization , 2007, ICS '07.

[39]  A. Taleb-Bendiab,et al.  A Comparative Study into Distributed Load Balancing Algorithms for Cloud Computing , 2010, 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops.

[40]  Antonello Monti,et al.  Viability of Virtual Machines in HPC - A State of the Art Analysis , 2016, Euro-Par Workshops.

[41]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[42]  Wei Huang,et al.  High performance virtual machine migration with RDMA over modern interconnects , 2007, 2007 IEEE International Conference on Cluster Computing.

[43]  Chao Wang,et al.  Proactive process-level live migration and back migration in HPC environments , 2012, J. Parallel Distributed Comput..

[44]  André Brinkmann,et al.  Smart grid-aware scheduling in data centres , 2015, 2015 Sustainable Internet and ICT for Sustainability (SustainIT).