Backup or Not: An Online Cost Optimal Algorithm for Data Analysis Jobs Using Spot Instances

Recently, large-scale public cloud providers begin to offer spot instances. This type of instance has become popular with more and more cloud users in the light of its convenient access mode and low price, especially for those big data analysis jobs with high performance computation requirements. However, using spot instances may carry the risk of being interrupted and lead to extra costs for job re-executions because these instances are generally unstable. Yet, such cost can be greatly reduced if a backup can be made at the right time before interruptions. For convenience and cost efficiency, users can choose the StaaS (Storage-as-a-Service) storage provided by the same cloud provider, whose spot instances are used by the users, to store backup data files for future job execution recovery. Since making backups too often will incur increased costs, users need to make the backup decisions appropriately considering the condition when an abrupt interruption will occur in the future. However, it is hard to know or predict precisely when such an interruption will occur. For solving this problem, in this article, we propose an online algorithm to guide cloud users to make backups when using spot instances to execute big data analysis jobs, without requiring any information about future interruptions. We prove theoretically that our proposed online algorithm can guarantee a bounded competitive ratio less than 2. Finally, according to extensive experiments, we verify the effectiveness of our online algorithm in reducing the additional cost caused by interruptions in using spot instances and find that our online algorithm can still achieve a stable cost optimization even if interruptions occur frequently.

[1]  Anna R. Karlin,et al.  Competitive randomized algorithms for non-uniform problems , 1990, SODA '90.

[2]  Mikhail Khodak,et al.  Learning Cloud Dynamics to Optimize Spot Instance Bidding Strategies , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[3]  Christopher Stewart,et al.  Blending on-demand and spot instances to lower costs for in-memory storage , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[4]  Allan Borodin,et al.  Online computation and competitive analysis , 1998 .

[5]  Rudolf Fleischer On The Bahncard Problem , 1998, COCOON.

[6]  Andrea C. Arpaci-Dusseau,et al.  FATE and DESTINI: A Framework for Cloud Recovery Testing , 2011, NSDI.

[7]  Weimin Zheng,et al.  Bidding for Highly Available Services with Low Price in Spot Instance Market , 2015, HPDC.

[8]  Shaojie Tang,et al.  Towards Optimal Bidding Strategy for Amazon EC2 Cloud Spot Instance , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[9]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[10]  Muli Ben-Yehuda,et al.  Deconstructing Amazon EC2 Spot Instance Pricing , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[11]  Ming Li,et al.  Uploading multiply deferrable big data to the cloud platform using cost-effective online algorithms , 2017, Future Gener. Comput. Syst..

[12]  Róbert Lovas,et al.  Cloud agnostic Big Data platform focusing on scalability and cost-efficiency , 2018, Adv. Eng. Softw..

[13]  Arun Venkataramani,et al.  Disaster Recovery as a Cloud Service: Economic Benefits & Deployment Challenges , 2010, HotCloud.

[14]  Shaojie Tang,et al.  A Framework for Amazon EC2 Bidding Strategy under SLA Constraints , 2014, IEEE Transactions on Parallel and Distributed Systems.

[15]  Bu-Sung Lee,et al.  Optimization of Resource Provisioning Cost in Cloud Computing , 2012, IEEE Transactions on Services Computing.

[16]  Yang Chen,et al.  TR-Spark: Transient Computing for Big Data Analytics , 2016, SoCC.

[17]  Prateek Sharma,et al.  Here Today, Gone Tomorrow: Exploiting Transient Servers in Datacenters , 2014, IEEE Internet Computing.

[18]  Javier Fabra,et al.  Reducing the price of resource provisioning using EC2 spot instances with prediction models , 2019, Future Gener. Comput. Syst..

[19]  Claire Mathieu,et al.  Dynamic TCP Acknowledgment and Other Stories about e/(e - 1) , 2003, Algorithmica.

[20]  Dwayne D. Gremler,et al.  Do Service Guarantees Guarantee Greater Market Value? , 2014 .

[21]  Nicola Blefari-Melazzi,et al.  An Approach to Balance Maintenance Costs and Electricity Consumption in Cloud Data Centers , 2018, IEEE Transactions on Sustainable Computing.

[22]  Artur Andrzejak,et al.  Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[23]  Rajkumar Buyya,et al.  Fault-tolerant Workflow Scheduling using Spot Instances on Clouds , 2014, ICCS.

[24]  Nian-Feng Tzeng,et al.  Effective Cost Reduction for Elastic Clouds under Spot Instance Pricing Through Adaptive Checkpointing , 2015, IEEE Transactions on Computers.

[25]  Tian Fang,et al.  A Bayesian method for risk window estimation with application to HPV vaccine trial , 2017, Comput. Stat. Data Anal..

[26]  Xiao Liu,et al.  A Revised Discrete Particle Swarm Optimization for Cloud Workflow Scheduling , 2010, 2010 International Conference on Computational Intelligence and Security.

[27]  Artur Andrzejak,et al.  Monetary Cost-Aware Checkpointing and Migration on Amazon Cloud Spot Instances , 2012, IEEE Transactions on Services Computing.