Cloud Resource Provisioning to Extend the Capacity of Local Resources in the Presence of Failures

In this paper, we investigate Cloud computing resource provisioning to extend the computing capacity of local clusters in the presence of failures. We consider three steps in the resource provisioning including resource brokering, dispatch sequences, and scheduling. The proposed brokering strategy is based on the stochastic analysis of routing in distributed parallel queues and takes into account the response time of the Cloud provider and the local cluster while considering computing cost of both sides. Moreover, we propose dispatching with probabilistic and deterministic sequences to redirect requests to the resource providers. We also incorporate check pointing in some well-known scheduling algorithms to provide a fault-tolerant environment. We propose two cost-aware and failure-aware provisioning policies that can be utilized by an organization that operates a cluster managed by virtual machine technology and seeks to use resources from a public Cloud provider. Simulation results demonstrate that the proposed policies improve the response time of users' requests by a factor of 4.10 under a moderate load with a limited cost on a public Cloud.

[1]  P. Sadayappan,et al.  Selective Reservation Strategies for Backfill Job Scheduling , 2002, JSSPP.

[2]  Xin Guo,et al.  Optimal probabilistic routing in distributed parallel queues , 2004, PERV.

[3]  Alexandru Iosup,et al.  A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing , 2009, CloudComp.

[4]  Arie Hordijk,et al.  Periodic routing to parallel queues and billiard sequences , 2004, Math. Methods Oper. Res..

[5]  Helen D. Karatza,et al.  Evaluation of gang scheduling performance and cost in a cloud computing system , 2010, The Journal of Supercomputing.

[6]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[7]  Jean-Marc Vincent,et al.  A Flexible Checkpoint/Restart Model in Distributed Systems , 2009, PPAM.

[8]  Borja Sotomayor,et al.  Virtual Infrastructure Management in Private and Hybrid Clouds , 2009, IEEE Internet Computing.

[9]  Bruno Gaujal,et al.  Optimal routing in parallel, non-observable queues and the price of anarchy revisited , 2010, 2010 22nd International Teletraffic Congress (lTC 22).

[10]  Rajkumar Buyya,et al.  Evaluating the cost-benefit of using cloud computing to extend the capacity of clusters , 2009, HPDC '09.

[11]  Franck Cappello,et al.  Cost-benefit analysis of Cloud Computing versus desktop grids , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[12]  Jean-Marc Vincent,et al.  Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home , 2011, IEEE Transactions on Parallel and Distributed Systems.

[13]  Ewa Deelman,et al.  The cost of doing science on the cloud: the Montage example , 2008, HiPC 2008.

[14]  Rajkumar Buyya,et al.  Harnessing Cloud Technologies for a Virtualized Distributed Computing Infrastructure , 2009, IEEE Internet Computing.

[15]  Christian Grimme,et al.  Prospects of Collaboration between Compute Providers by Means of Job Interchange , 2007, JSSPP.

[16]  Hui Li,et al.  Workload Characteristics of a Multi-cluster Supercomputer , 2004, JSSPP.

[17]  Alexandru Iosup,et al.  The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[18]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[19]  Jim Freeman,et al.  Stochastic Processes (Second Edition) , 1996 .

[20]  Rajkumar Buyya,et al.  A grid workflow environment for brain imaging analysis on distributed systems , 2009 .

[21]  Paul Marshall,et al.  Elastic Site: Using Clouds to Elastically Extend Site Resources , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[22]  Leonard Kleinrock,et al.  Collecting unused processing capacity: an analysis of transient distributed systems , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[23]  Matei Ripeanu,et al.  Amazon S3 for science grids: a viable solution? , 2008, DADC '08.

[24]  Jean-Marc Vincent,et al.  Mining for statistical models of availability in large-scale distributed systems: An empirical study of SETI@home , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[25]  Alexandru Iosup,et al.  The Grid Workloads Archive , 2008, Future Gener. Comput. Syst..