Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud System

Kubernetes (k8s) has the potential to merge the distributed edge and the cloud but lacks a scheduling framework specifically for edge-cloud systems. Besides, the hierarchical distribution of heterogeneous resources and the complex dependencies among requests and resources make the modeling and scheduling of k8s-oriented edge-cloud systems particularly sophisticated. In this paper, we introduce KaiS, a learning-based scheduling framework for such edge-cloud systems to improve the long-term throughput rate of request processing. First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch and dynamic dispatch spaces within the edge cluster. Second, for diverse system scales and structures, we use graph neural networks to embed system state information, and combine the embedding results with multiple policy networks to reduce the orchestration dimensionality by stepwise scheduling. Finally, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration, and present the implementation design of deploying the above algorithms compatible with native k8s components. Experiments using real workload traces show that KaiS can successfully learn appropriate scheduling policies, irrespective of request arrival patterns and system scales. Moreover, KaiS can enhance the average system throughput rate by 14.3% while reducing scheduling cost by 34.7% compared to baselines.

[1]  A. Tulino,et al.  Joint Service Placement and Request Routing in Multi-cell Mobile Edge Computing Networks , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[2]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[3]  M. Herbster,et al.  Service Placement with Provable Guarantees in Heterogeneous Edge Computing Systems , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[4]  Shan Zhang,et al.  Cooperative Service Caching and Workload Scheduling in Mobile Edge Computing , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.

[5]  Wenzhong Li,et al.  Efficient Multi-User Computation Offloading for Mobile-Edge Cloud Computing , 2015, IEEE/ACM Transactions on Networking.

[6]  Valeria Cardellini,et al.  Geo-distributed efficient deployment of containers with Kubernetes , 2020, Comput. Commun..

[7]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[8]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[9]  Wencong Xiao,et al.  Gandiva: Introspective Cluster Scheduling for Deep Learning , 2018, OSDI.

[10]  Ju Ren,et al.  A Survey on End-Edge-Cloud Orchestrated Network Computing Paradigms , 2019, ACM Comput. Surv..

[11]  Thomas F. La Porta,et al.  Service Placement and Request Scheduling for Data-intensive Applications in Edge Clouds , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[12]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14]  Hai Jin,et al.  A3C-DO: A Regional Resource Scheduling Framework Based on Deep Reinforcement Learning in Edge Scenario , 2021, IEEE Transactions on Computers.

[15]  Robert N. M. Watson,et al.  Firmament: Fast, Centralized Cluster Scheduling at Scale , 2016, OSDI.

[16]  Lifeng Sun,et al.  Intelligent Video Caching at Network Edge: A Multi-Agent Deep Reinforcement Learning Approach , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.

[17]  Wenwu Zhu,et al.  Deep Learning on Graphs: A Survey , 2018, IEEE Transactions on Knowledge and Data Engineering.

[18]  Marco Gramaglia,et al.  vrAIn: A Deep Learning Approach Tailoring Computing and Radio Resources in Virtualized RANs , 2019, MobiCom.

[19]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Gergely Pongrácz,et al.  Sharpening Kubernetes for the Edge , 2019, SIGCOMM Posters and Demos.

[22]  Xiaofei Wang,et al.  Convergence of Edge Computing and Deep Learning: A Comprehensive Survey , 2019, IEEE Communications Surveys & Tutorials.

[23]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[24]  Xiang-Yang Li,et al.  Online job dispatching and scheduling in edge-clouds , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[25]  Eric A. Brewer,et al.  Borg, Omega, and Kubernetes , 2016, ACM Queue.