Heterogeneous Task Co-location in Containerized Cloud Computing Environments

Although cloud computing became a mainstream industrial computing paradigm, low resource utilization remains a common problem that most warehouse-scale datacenters suffer from. This leads to a significant waste of hardware resources, infrastructure investment, and energy consumption. As the diversity in application workloads grows into an essential characteristic in modern datacenters, task co-location of different workloads to the same compute cluster has gained immense popularity as a heuristic solution for resource utilization optimization. Although the existing co-location methodologies manage to improve resource efficiency to a certain degree, application QoS is usually sacrificed as a trade-off when dealing with resource interference between different applications. This paper proposes a containerized task co-location (CTCL) scheduler to improve resource utilization and minimize task eviction rate. Our CTCL scheduler (1) applies an elastic task co-location strategy to improve resource utilization; and (2) supports a dynamic task rescheduling mechanism to prevent severe QoS degradation from frequent task evictions. We evaluate our approach in terms of resource efficiency and rescheduling cost through the ContainerCloudSim simulator. Our experiments with the Alibaba 2018 workload traces demonstrate that CTCL could improve overall resource efficiency and reduce rescheduling rate by 38% and 99% respectively.

[1]  Kejiang Ye,et al.  Imbalance in the cloud: An analysis on Alibaba cluster trace , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[2]  Chita R. Das,et al.  Modeling and synthesizing task placement constraints in Google compute clusters , 2011, SoCC.

[3]  Xifeng Yan,et al.  Workload characterization and prediction in the cloud: A multiple time series approach , 2012, 2012 IEEE Network Operations and Management Symposium.

[4]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[5]  Rajkumar Buyya,et al.  ContainerCloudSim: An environment for modeling and simulation of containers in cloud data centers , 2017, Softw. Pract. Exp..

[6]  Omer F. Rana,et al.  Modelling Performance & Resource Management in Kubernetes , 2016, 2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC).

[7]  Wei Lin,et al.  Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing , 2014, OSDI.

[8]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[9]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[10]  K. Chandrasekaran,et al.  Straddling the crevasse: A review of microservice software architecture foundations and recent advancements , 2019, Softw. Pract. Exp..

[11]  Arvind Krishnamurthy,et al.  Characterizing Private Clouds: A Large-Scale Empirical Analysis of Enterprise Clusters , 2016, SoCC.

[12]  Kento Aida,et al.  Towards Understanding the Usage Behavior of Google Cloud Users: The Mice and Elephants Phenomenon , 2014, 2014 IEEE 6th International Conference on Cloud Computing Technology and Science.

[13]  Jie Xu,et al.  An Analysis of the Server Characteristics and Resource Utilization in Google Cloud , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).

[14]  Kazuhiko Kato,et al.  Improving Agility and Elasticity in Bare-metal Clouds , 2015, ASPLOS.

[15]  Zhibin Yu,et al.  The Elasticity and Plasticity in Semi-Containerized Co-locating Cloud Workload: a View from Alibaba Trace , 2018, SoCC.

[16]  Huan Liu,et al.  A Measurement Study of Server Utilization in Public Clouds , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[17]  Chao Li,et al.  Fuxi: a Fault-Tolerant Resource Management and Job Scheduling System at Internet Scale , 2014, Proc. VLDB Endow..

[18]  Guangjie Han,et al.  Characteristics of Co-Allocated Online Services and Batch Jobs in Internet Data Centers: A Case Study From Alibaba Cloud , 2019, IEEE Access.

[19]  Konstantinos Vandikas,et al.  Bare-metal, virtual machines and containers in OpenStack , 2017, 2017 20th Conference on Innovations in Clouds, Internet and Networks (ICIN).

[20]  Rajkumar Buyya,et al.  Renewable-aware geographical load balancing of web applications for sustainable data centers , 2017, J. Netw. Comput. Appl..

[21]  Nitin Naik Building a virtual system of systems using docker swarm in multiple clouds , 2016, 2016 IEEE International Symposium on Systems Engineering (ISSE).

[22]  Kevin Lee,et al.  Empirical prediction models for adaptive resource provisioning in the cloud , 2012, Future Gener. Comput. Syst..

[23]  Christina Delimitrou,et al.  PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services , 2019, ASPLOS.

[24]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[25]  Zhenhuan Gong,et al.  PRESS: PRedictive Elastic ReSource Scaling for cloud systems , 2010, 2010 International Conference on Network and Service Management.

[26]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[27]  Evgenia Smirni,et al.  Data Centers in the Cloud: A Large Scale Performance Study , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[28]  Ann Mary Joy,et al.  Performance comparison between Linux containers and virtual machines , 2015, 2015 International Conference on Advances in Computer Engineering and Applications.

[29]  Gregory R. Ganger,et al.  Stratus: cost-aware container scheduling in the public cloud , 2018, SoCC.

[30]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[31]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[32]  Rajkumar Buyya,et al.  iBrownout: An Integrated Approach for Managing Energy and Brownout in Container-Based Clouds , 2018, IEEE Transactions on Sustainable Computing.

[33]  Chita R. Das,et al.  Towards characterizing cloud backend workloads: insights from Google compute clusters , 2010, PERV.

[34]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[35]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[36]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[37]  Jing Guo,et al.  Who Limits the Resource Efficiency of My Datacenter: An Analysis of Alibaba Datacenter Traces , 2019, 2019 IEEE/ACM 27th International Symposium on Quality of Service (IWQoS).