Enabling Cloud Applications to Negotiate Multiple Resources in a Cost-Efficient Manner

Cloud applications can achieve similar performance with diverse multi-resource configurations, allowing cloud service providers to benefit from optimal resource allocation for reducing their operation cost. This paper aims to solve the problem of multi-resource negotiation with considerations of both the service-level agreement (SLA) and the cost efficiency so that the performance requirement for cloud services is satisfied and the cost of resource usage is also minimized. The performance and resource demand are usually application-dependent, making the optimization problem complicated, especially when the dimension of multi-resource configurations is large. To this end, we use reinforcement learning to solve the optimal problem of multi-resource configuration with simultaneous optimization of the learning efficiency and performance guarantee. The developed prototype named SmartYARN is an extended Apache YARN equipped with our learning algorithm which can enable cloud applications to negotiate multiple resources cost-effectively. The extensive evaluations with four typical benchmarks show that SmartYARN performs well in reducing the cost of resource usage while maintaining compliance with the SLA constraints of cloud service simultaneously.

[1]  Bingsheng He,et al.  Fairness-Efficiency Allocation of CPU-GPU Heterogeneous Resources , 2019, IEEE Transactions on Services Computing.

[2]  Yu Zhang,et al.  Intelligent Cloud Resource Management with Deep Reinforcement Learning , 2018, IEEE Cloud Computing.

[3]  Hans-Arno Jacobsen,et al.  Robust Multi-Resource Allocation with Demand Uncertainties in Cloud Scheduler , 2017, 2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS).

[4]  Hans-Arno Jacobsen,et al.  Cost-efficient negotiation over multiple resources with reinforcement learning , 2017, 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS).

[5]  Haibing Guan,et al.  Energy-Efficient SLA Guarantees for Virtualized GPU in Cloud Gaming , 2015, IEEE Transactions on Parallel and Distributed Systems.

[6]  Liang Zheng,et al.  How to Bid the Cloud , 2015, Comput. Commun. Rev..

[7]  Arne Ludwig,et al.  Competitive Strategies for Online Cloud Resource Allocation with Discounts: The 2-Dimensional Parking Permit Problem , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[8]  Jun Zhang,et al.  Cloud Computing Resource Scheduling and a Survey of Its Evolutionary Approaches , 2015, ACM Comput. Surv..

[9]  Henry Hoffmann,et al.  A Probabilistic Graphical Model-based Approach for Minimizing Energy Under Performance Constraints , 2015, ASPLOS.

[10]  Henry Hoffmann,et al.  Minimizing energy under performance constraints on embedded platforms: resource allocation heuristics for homogeneous and single-ISA heterogeneous multi-cores , 2015, SIGBED.

[11]  Michael F. P. O'Boyle,et al.  Change Detection Based Parallelism Mapping: Exploiting Offline Models and Online Adaptation , 2014, LCPC.

[12]  Sujata Banerjee,et al.  Application-driven bandwidth guarantees in datacenters , 2014, SIGCOMM.

[13]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[14]  Yin Wang,et al.  VGRIS: Virtualized GPU Resource Isolation and Scheduling in Cloud Gaming , 2013, TACO.

[15]  Susanne Albers,et al.  Race to idle: New algorithms for speed scaling with a sleep state , 2012, TALG.

[16]  Ulrich Lampe,et al.  Pricing in Infrastructure Clouds - An Analytical and Empirical Examination , 2014, AMCIS.

[17]  Henry Hoffmann Racing and pacing to idle: an evaluation of heuristics for energy-aware resource allocation , 2013, HotPower '13.

[18]  Imtiaz Ahmad,et al.  Cloud Computing Pricing Models: A Survey , 2013 .

[19]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[20]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[21]  Gernot Heiser,et al.  Slow Down or Sleep, That Is the Question , 2011, USENIX Annual Technical Conference.

[22]  Samuel Kounev,et al.  Model-based self-adaptive resource allocation in virtualized environments , 2011, SEAMS '11.

[23]  Isis Truck,et al.  Using Reinforcement Learning for Autonomic Resource Allocation in Clouds: towards a fully automated workflow , 2011 .

[24]  Fabio Checconi,et al.  Modular software architecture for flexible reservation mechanisms on heterogeneous resources , 2011, J. Syst. Archit..

[25]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[26]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[27]  Yong Meng Teo,et al.  Dynamic Resource Pricing on Federated Clouds , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[28]  Juhnyoung Lee,et al.  A view of cloud computing , 2010, CACM.

[29]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[30]  M. Prange,et al.  Scientific Computing in the Cloud , 2008, Computing in Science & Engineering.

[31]  Nikolay Borissov,et al.  Cloud Computing – A Classification, Business Models, and Research Directions , 2009, Bus. Inf. Syst. Eng..

[32]  Le Yi Wang,et al.  VCONF: a reinforcement learning approach to virtual machines auto-configuration , 2009, ICAC '09.

[33]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[34]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[35]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[36]  Barton P. Miller,et al.  On-line automated performance diagnosis on thousands of processes , 2006, PPoPP '06.

[37]  Kapil Vaswani,et al.  Construction and use of linear regression models for processor performance analysis , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[38]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[39]  Prashant J. Shenoy,et al.  Dynamic resource allocation for shared data centers using online measurements , 2003, IWQoS'03.

[40]  Kanad Ghose,et al.  Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[41]  Klara Nahrstedt,et al.  A control-based middleware framework for quality-of-service adaptations , 1999, IEEE J. Sel. Areas Commun..

[42]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .