Information Models: Creating and Preserving Value in Volatile Cloud Resources

Volatile resources are surplus cloud resources not consumed by high priority foreground (reserved/on-demand) load. These resources are exploited by a growing number of users. Today, cloud operators provide no statistical characterization of volatile resources. We consider how releasing such statistics could improve user value by studying Amazon's 608 EC2 Spot Instance types. Results show that as little as two parameters such as (average, 90pctile) can increase user value by 30%. These results are robust over four-fifths (475 of 608) of instance types. Beyond competitive concerns, cloud operators are reluctant to share volatile resource statistics because they might be considered a service-level agreement (SLA), and thus constrain their ability to serve foreground load. We show that clever resource management can allay such concerns. We study two plausible classes of foreground load changes, showing one class where such a concern is indeed valid and another where it is not. We design two online resource management algorithms that detect foreground load variation and adapt to maintain a statistical SLA. The algorithms not only improve the ability to maintain guarantees and user value but also improve user experience, reducing job failures by 50%. These results apply to the Stable and Transition classes of instance types, which account for nearly all of the instance types (577 of 608).

[1]  Weimin Zheng,et al.  Bidding for Highly Available Services with Low Price in Spot Instance Market , 2015, HPDC.

[2]  Artur Andrzejak,et al.  Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[3]  Vinton G. Cerf,et al.  A protocol for packet network intercommunication , 1974, CCRV.

[4]  Artur Andrzejak,et al.  Monetary Cost-Aware Checkpointing and Migration on Amazon Cloud Spot Instances , 2012, IEEE Transactions on Services Computing.

[5]  Yang Song,et al.  Optimal bidding in spot instance market , 2012, 2012 Proceedings IEEE INFOCOM.

[6]  Asser N. Tantawi,et al.  See Spot Run: Using Spot Instances for MapReduce Workflows , 2010, HotCloud.

[7]  Christopher Stewart,et al.  Blending on-demand and spot instances to lower costs for in-memory storage , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[8]  Justine Sherry,et al.  Silo: Predictable Message Latency in the Cloud , 2015, Comput. Commun. Rev..

[9]  Rajkumar Buyya,et al.  Statistical Modeling of Spot Instance Prices in Public Cloud Environments , 2011, 2011 Fourth IEEE International Conference on Utility and Cloud Computing.

[10]  Richard Wolski,et al.  Probabilistic Guarantees of Execution Duration for Amazon Spot Instances , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Prashant J. Shenoy,et al.  SpotLight: An Information Service for the Cloud , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[12]  David E. Irwin,et al.  Transient Guarantees: Maximizing the Value of Idle Cloud Capacity , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[14]  Chaojie Zhang How to Increase the Value of Volatile Cloud Resources : Resource Management and Information Disclosure , 2018 .

[15]  Nian-Feng Tzeng,et al.  Effective Cost Reduction for Elastic Clouds under Spot Instance Pricing Through Adaptive Checkpointing , 2015, IEEE Transactions on Computers.

[16]  Francisco Vilar Brasileiro,et al.  Long-term SLOs for reclaimed cloud computing resources , 2014, SoCC.

[17]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[18]  Rodrigo Fonseca,et al.  Retro: Targeted Resource Management in Multi-tenant Distributed Systems , 2015, NSDI.

[19]  Richard Wolski,et al.  Providing statistical reliability guarantees in the AWS spot tier , 2016, SpringSim.

[20]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[21]  Muli Ben-Yehuda,et al.  Deconstructing Amazon EC2 Spot Instance Pricing , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[22]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.