Characterizing and orchestrating VM reservation in geo-distributed clouds to improve the resource efficiency

Cloud providers often build a geo-distributed cloud from multiple datacenters in different geographic regions, to serve tenants at different locations. The tenants that run large scale applications often reserve resources based on their peak loads in the region close to the end users to handle the ever changing application load, wasting a large amount of resources. We therefore characterize the VM request patterns of the top tenants in our production public geo-distributed cloud, and open-source the VM request traces in four months from the top 20 tenants of our cloud. The characterization shows that the resource usage of large tenants has various temporal and spatial patterns on the dimensions of time series, regions, and VM types, and has the potential of peak shaving between different tenants to further reduce the resource reservation cost. Based on the findings, we propose a resource reservation and VM request scheduling scheme named ROS to minimize the resource reservation cost while satisfying the VM allocation requests. Our experiments show that ROS reduces the overall deployment cost by 75.4% and the reservation resources by 60.1%, compared to the tenant-specified reservation strategy.

[1]  Minyi Guo,et al.  Adaptive Resource Efficient Microservice Deployment in Cloud-Edge Continuum , 2022, IEEE Transactions on Parallel and Distributed Systems.

[2]  Deze Zeng,et al.  QoS-awareness of Microservices with Excessive Loads via Inter-Datacenter Scheduling , 2022, 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[3]  Youtao Zhang,et al.  Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS , 2022, 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[4]  Wei Zhang,et al.  Astraea: towards QoS-aware and resource-efficient multi-stage GPU services , 2022, ASPLOS.

[5]  Deze Zeng,et al.  DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs , 2022, USENIX Annual Technical Conference.

[6]  Quan Chen,et al.  RunD: A Lightweight Secure Container Runtime for High-density Deployment and High-concurrency Startup in Serverless Computing , 2022, USENIX Annual Technical Conference.

[7]  Deze Zeng,et al.  Help Rather Than Recycle: Alleviating Cold Startup in Serverless Computing Through Inter-Function Container Sharing , 2022, USENIX Annual Technical Conference.

[8]  Yong Li,et al.  MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters , 2022, NSDI.

[9]  Kejiang Ye,et al.  Characterizing Microservice Dependency and Performance: Alibaba Trace Analysis , 2021, SoCC.

[10]  Christina Delimitrou,et al.  Faster and Cheaper Serverless Computing on Harvested Resources , 2021, SOSP.

[11]  Luping Wang,et al.  Metis: Learning to Schedule Long-Running Applications in Shared Container Clusters at Scale , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  T. Moscibroda,et al.  Protean: VM Allocation Service at Scale , 2020, OSDI.

[13]  Mor Harchol-Balter,et al.  Borg: the next generation , 2020, EuroSys.

[14]  Ricardo Bianchini,et al.  Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider , 2020, USENIX Annual Technical Conference.

[15]  Tirthak Patel,et al.  CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale Computers , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[16]  Dastan Hussen Maulud,et al.  A Review on Linear Regression Comprehensive in Machine Learning , 2020 .

[17]  Sachin Kulkarni,et al.  Twine: A Unified Cluster Management System for Shared Infrastructure , 2020, OSDI.

[18]  K. V. Rashmi,et al.  A large scale analysis of hundreds of in-memory cache clusters at Twitter , 2020, OSDI.

[19]  Wei Wang,et al.  Characterizing and Synthesizing Task Dependencies of Data-Parallel Jobs in Alibaba Cloud , 2019, SoCC.

[20]  Tianyin Xu,et al.  Taiji: managing global user traffic for large-scale internet services at the edge , 2019, SOSP.

[21]  James Cheng,et al.  Yugong: Geo-Distributed Data and Job Placement at Scale , 2019, Proc. VLDB Endow..

[22]  Jing Guo,et al.  Who Limits the Resource Efficiency of My Datacenter: An Analysis of Alibaba Datacenter Traces , 2019, 2019 IEEE/ACM 27th International Symposium on Quality of Service (IWQoS).

[23]  Christina Delimitrou,et al.  PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services , 2019, ASPLOS.

[24]  Akbar Siami Namin,et al.  A Comparison of ARIMA and LSTM in Forecasting Time Series , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[25]  Zhibin Yu,et al.  The Elasticity and Plasticity in Semi-Containerized Co-locating Cloud Workload: a View from Alibaba Trace , 2018, SoCC.

[26]  Ali Anwar,et al.  Characterizing Co-located Datacenter Workloads: An Alibaba Case Study , 2018, APSys.

[27]  David M. Brooks,et al.  Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[28]  Min Zhao,et al.  Internet Video Data Streaming: Energy-saving and Cost-aware Methods , 2017 .

[29]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.

[30]  Anand Sivasubramaniam,et al.  Right-Sizing Geo-distributed Data Centers for Availability and Latency , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[31]  Onur Mutlu,et al.  Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.

[32]  Robert N. M. Watson,et al.  Firmament: Fast, Centralized Cluster Scheduling at Scale , 2016, OSDI.

[33]  Mor Harchol-Balter,et al.  TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters , 2016, EuroSys.

[34]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[35]  Kento Aida,et al.  Towards Understanding the Usage Behavior of Google Cloud Users: The Mice and Elephants Phenomenon , 2014, 2014 IEEE 6th International Conference on Cloud Computing Technology and Science.

[36]  Wei Lin,et al.  Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing , 2014, OSDI.

[37]  Chao Li,et al.  Fuxi: a Fault-Tolerant Resource Management and Job Scheduling System at Internet Scale , 2014, Proc. VLDB Endow..

[38]  Franck Cappello,et al.  Characterizing Cloud Applications on a Google Data Center , 2013, 2013 42nd International Conference on Parallel Processing.

[39]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[40]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[41]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.