Decentralized Kubernetes Federation Control Plane

This position paper presents our vision for a distributed decentralized Kubernetes federation control plane. The goal is to support federations consisting of thousands of Kubernetes clusters, in order to support next generation edge cloud use-cases. Our review of the literature and experience with the current centralized state of the art Kubernetes federation controllers shows that it is unable to scale to a sufficient size, and centralization constitutes an unacceptable single point of failure. Our proposed system maintains cluster autonomy, allows clusters to collaboratively handle error conditions, and scales to support edge cloud use-cases. Our approach is based on a shared database of conflict-free replicated data types (CRDTs), shared among all clusters in the federation, and algorithms that make use of the data.

[1]  Anja Feldmann,et al.  Logically centralized?: state distribution trade-offs in software defined networks , 2012, HotSDN '12.

[2]  Franco Cicirelli,et al.  An edge-based platform for dynamic Smart City applications , 2017, Future Gener. Comput. Syst..

[3]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[4]  Muli Ben-Yehuda,et al.  The Reservoir model and architecture for open federated cloud computing , 2009, IBM J. Res. Dev..

[5]  Yashar Ganjali,et al.  HyperFlow: A Distributed Control Plane for OpenFlow , 2010, INM/WREN.

[6]  Maria Kihl,et al.  Impact of etcd deployment on Kubernetes, Istio, and application performance , 2020, Softw. Pract. Exp..

[7]  Yuan Cheng,et al.  A string-wise CRDT algorithm for smart and large-scale collaborative editing systems , 2017, Adv. Eng. Informatics.

[8]  Martín Casado,et al.  Onix: A Distributed Control Platform for Large-scale Production Networks , 2010, OSDI.

[9]  Neil A. Ernst,et al.  Performance Evaluation of NoSQL Databases: A Case Study , 2015, PABS@ICPE.

[10]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[11]  Sérgio Duarte,et al.  An optimized conflict-free replicated set , 2012, ArXiv.

[12]  Marc Shapiro,et al.  Conflict-Free Replicated Data Types , 2011, SSS.

[13]  Erik Elmroth,et al.  Scheduling and monitoring of internally structured services in Cloud federations , 2011, 2011 IEEE Symposium on Computers and Communications (ISCC).

[14]  Michael Derntl,et al.  Near Real-Time Peer-to-Peer Shared Editing on Extensible Data Types , 2016, GROUP.

[15]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[16]  Russell Brown,et al.  Big(ger) sets: decomposed delta CRDT sets in Riak , 2016, PaPoC@EuroSys.

[17]  Olov Schelén,et al.  DOCMA: A Decentralized Orchestrator for Containerized Microservice Applications , 2019, 2019 IEEE Cloud Summit.

[18]  Usama Ahmed,et al.  Risk-Based Service Selection in Federated Clouds , 2018, 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion).

[19]  Daniel J. Abadi,et al.  Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story , 2012, Computer.

[20]  Flavien Quesnel,et al.  Cooperative Dynamic Scheduling of Virtual Machines in Distributed Systems , 2011, Euro-Par Workshops.

[21]  Christopher Meiklejohn On the composability of the Riak DT map: expanding from embedded to multi-key structures , 2014, PaPEC '14.

[22]  Weihai Yu A string-wise CRDT for group editing , 2012, GROUP '12.

[23]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX Annual Technical Conference.

[24]  Hua-Jun Hong,et al.  Distributed analytics in fog computing platforms using tensorflow and kubernetes , 2017, 2017 19th Asia-Pacific Network Operations and Management Symposium (APNOMS).

[25]  Ali Shoker,et al.  Efficient State-Based CRDTs by Delta-Mutation , 2014, NETYS.

[26]  Fetahi Zebenigus Wuhib,et al.  Edge Computing Resource Management System: a Critical Building Block! Initiating the debate via OpenStack , 2018, HotEdge.

[27]  Thierry Coupaye,et al.  Combining Heuristics to Optimize and Scale the Placement of IoT Applications in the Fog , 2018, 2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC).

[28]  Adrien Lebre,et al.  Flauncher and DVMS Deploying and Scheduling Thousands of Virtual Machines on Hundreds of Nodes Distributed Geographically , 2013 .

[29]  Marco Aiello,et al.  Metrics for Sustainable Data Centers , 2017, IEEE Transactions on Sustainable Computing.

[30]  Benoit Hudzia,et al.  Future Generation Computer Systems Optimis: a Holistic Approach to Cloud Service Provisioning , 2022 .