论文信息 - Leveraging checkpoint/restore to optimize utilization of cloud compute resources

Leveraging checkpoint/restore to optimize utilization of cloud compute resources

Cloud computing services have varying performance characteristics that the cloud provider often hides from the user. Thus, it is difficult for a user to make operational decisions about when and where to run their jobs. In this paper, we present a series of experiments on a cloud computing platform to understand these characteristics and then present a series of strategies to optimize utilization on these cloud platforms. Specifically, our experiments were performed on the lowest tier of Amazon Elastic Cloud (EC2) resources, and initial tests measured performance characteristics of these resources at different locations over time. Testing revealed that certain performance measures could be improved by location choice. For short-duration jobs, CPU performance was nearly constant, but storage performance had more variability. For long-duration jobs, CPU throttling caused a significant penalty. Using a system of checkpoint and migration between virtual machines allowed this CPU penalty to be avoided, resulting in significant savings and improved runtime.

John A. Chandy | Rohit K. Mehta

[1] T. S. Eugene Ng,et al. The Impact of Virtualization on Network Performance of Amazon EC2 Data Center , 2010, 2010 Proceedings IEEE INFOCOM.

[2] Arun Venkataramani,et al. Black-box and Gray-box Strategies for Virtual Machine Migration , 2007, NSDI.

[3] Antti Ylä-Jääski,et al. Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2 , 2012, HotCloud.

[4] Albert G. Greenberg,et al. The cost of a cloud: research problems in data center networks , 2008, CCRV.

[5] Artur Andrzejak,et al. Monetary Cost-Aware Checkpointing and Migration on Amazon Cloud Spot Instances , 2012, IEEE Transactions on Services Computing.

[6] Dutch T. Meyer,et al. Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[7] Artur Andrzejak,et al. Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[8] Franck Cappello,et al. Optimization of cloud task processing with checkpoint-restart mechanism , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[9] Geoffroy Vallée,et al. Checkpoint/Restart of Virtual Machines Based on Xen , 2006 .

[10] KondoDerrick,et al. Monetary Cost-Aware Checkpointing and Migration on Amazon Cloud Spot Instances , 2012 .