Intelligent container reallocation at Microsoft 365

The use of containers in microservices has gained popularity as it facilitates agile development, resource governance, and software maintenance. Container reallocation aims to achieve workload balance via reallocating containers over physical machines. It affects the overall performance of microservice-based systems. However, container scheduling and reallocation remain an open issue due to their complexity in real-world scenarios. In this paper, we propose a novel Multi-Phase Local Search (MPLS) algorithm to optimize container reallocation. The experimental results show that our optimization algorithm outperforms state-of-the-art methods. In practice, it has been successfully applied to Microsoft 365 system to mitigate hotspot machines and balance workloads across the entire system.

[1]  Andrzej Jaszkiewicz,et al.  Genetic local search for multi-objective combinatorial optimization , 2022 .

[2]  Jun Zhang,et al.  Cloud Computing Resource Scheduling and a Survey of Its Evolutionary Approaches , 2015, ACM Comput. Surv..

[3]  Haipeng Luo,et al.  Adaptive Resource Provisioning for the Cloud Using Online Bin Packing , 2014, IEEE Transactions on Computers.

[4]  Claus Pahl,et al.  Containerization and the PaaS Cloud , 2015, IEEE Cloud Computing.

[5]  Mike Amundsen,et al.  Microservice Architecture: Aligning Principles, Practices, and Culture , 2016 .

[6]  Carlos Juiz,et al.  Resource optimization of container orchestration: a case study in multi-cloud microservices-based applications , 2018, The Journal of Supercomputing.

[7]  Wentong Cai,et al.  On dynamic bin packing for resource allocation in the cloud , 2014, SPAA.

[8]  Arun Venkataramani,et al.  Black-box and Gray-box Strategies for Virtual Machine Migration , 2007, NSDI.

[9]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[10]  Kuo-Qin Yan,et al.  Towards a Load Balancing in a three-level cloud computing network , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[11]  Sadok Bouamama,et al.  Solving bin Packing Problem with a Hybrid Genetic Algorithm for VM Placement in Cloud , 2015, KES.

[12]  Chanwit Kaewkasi,et al.  Improvement of container scheduling for Docker using Ant Colony Optimization , 2017, 2017 9th International Conference on Knowledge and Smart Technology (KST).

[13]  Medhat A. Tawfeek,et al.  Cloud task scheduling based on ant colony optimization , 2013, 2013 8th International Conference on Computer Engineering & Systems (ICCES).

[14]  Miika Komu,et al.  Hypervisors vs. Lightweight Virtualization: A Performance Comparison , 2015, 2015 IEEE International Conference on Cloud Engineering.

[15]  Moustafa Ghanem,et al.  Elastic Application Container: A Lightweight Approach for Cloud Resource Provisioning , 2012, 2012 IEEE 26th International Conference on Advanced Information Networking and Applications.

[16]  Mohammed A. Alqarni,et al.  A placement architecture for a container as a service (CaaS) in a cloud environment , 2019, J. Cloud Comput..

[17]  Li Xiao,et al.  Improving distributed workload performance by sharing both CPU and memory resources , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[18]  Jiafeng Zhu,et al.  Application Oriented Dynamic Resource Allocation for Data Centers Using Docker Containers , 2017, IEEE Communications Letters.

[19]  Shaowei Cai,et al.  Two-goal Local Search and Inference Rules for Minimum Dominating Set , 2020, IJCAI.

[20]  Kay Chen Tan,et al.  Multi-objective and prioritized berth allocation in container ports , 2010, Ann. Oper. Res..

[21]  Rajkumar Buyya,et al.  Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in Cloud data centers , 2012, Concurr. Comput. Pract. Exp..

[22]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[23]  Peter Desnoyers,et al.  Memory buddies: exploiting page sharing for smart colocation in virtualized data centers , 2009, VEE '09.

[24]  D. Goodin The cambridge dictionary of statistics , 1999 .

[25]  Manpreet Singh,et al.  Adaptive and Dynamic Load Balancing in Grid Using Ant Colony Optimization , 2012 .

[26]  Nitin Naik Building a virtual system of systems using docker swarm in multiple clouds , 2016, 2016 IEEE International Symposium on Systems Engineering (ISSE).

[27]  Azizkhan F Pathan,et al.  A Load Balancing Model Based on Cloud Partitioning for the Public Cloud , 2014 .

[28]  Mitsuo Gen,et al.  Genetic algorithms and engineering optimization , 1999 .

[29]  Yi Mei,et al.  Novel Genetic Algorithm with Dual Chromosome Representation for Resource Allocation in Container-Based Clouds , 2019, 2019 IEEE 12th International Conference on Cloud Computing (CLOUD).

[30]  Peng Li,et al.  Improving Service Availability of Cloud Systems by Predicting Disk Error , 2018, USENIX ATC.

[31]  Hua Wang,et al.  An Energy-Aware Ant Colony Algorithm for Network-Aware Virtual Machine Placement in Cloud Computing , 2016, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS).

[32]  Murali Chintalapati,et al.  Gandalf: An Intelligent, End-To-End Analytics Service for Safe Deployment in Large-Scale Cloud Infrastructure , 2020, NSDI.

[33]  Jan Karel Lenstra,et al.  Complexity of machine scheduling problems , 1975 .

[34]  Moustafa Ghanem,et al.  Lightweight Resource Scaling for Cloud Applications , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[35]  Yves Crama,et al.  Local Search in Combinatorial Optimization , 2018, Artificial Neural Networks.

[36]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[37]  Rajkumar Buyya,et al.  A Framework and Algorithm for Energy Efficient Container Consolidation in Cloud Data Centers , 2015, 2015 IEEE International Conference on Data Science and Data Intensive Systems.

[38]  Raymond Chiong,et al.  Global versus local search: the impact of population sizes on evolutionary algorithm performance , 2016, J. Glob. Optim..

[39]  Dharmesh Kakadia,et al.  Virtualization vs Containerization to Support PaaS , 2014, 2014 IEEE International Conference on Cloud Engineering.

[40]  Wilhelm Hasselbring,et al.  Search-based genetic optimization for deployment and reconfiguration of software in the cloud , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[41]  Sherali Zeadally,et al.  Container-as-a-Service at the Edge: Trade-off between Energy Efficiency and Service Availability at Fog Nano Data Centers , 2017, IEEE Wireless Communications.

[42]  Brendan Burns,et al.  Kubernetes: Up and Running: Dive into the Future of Infrastructure , 2017 .

[43]  Antonio Brogi,et al.  Cloud Container Technologies: A State-of-the-Art Review , 2019, IEEE Transactions on Cloud Computing.

[44]  Andrea Tosatto,et al.  Container-Based Orchestration in Cloud: State of the Art and Challenges , 2015, 2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems.

[45]  Carlos Juiz,et al.  Genetic Algorithm for Multi-Objective Optimization of Container Allocation in Cloud Architecture , 2017, Journal of Grid Computing.