Scaling a Declarative Cluster Manager Architecture with Query Optimization Techniques

Cluster managers play a crucial role in data centers by distributing workloads among infrastructure resources. Declarative Cluster Management (DCM) is a new cluster management architecture that enables users to express placement policies declaratively using SQL-like queries. This paper presents our experiences in scaling this architecture from moderate-sized enterprise clusters (10 2 - 10 3 nodes) to hyperscale clusters (10 4 nodes) via query optimization techniques. First, we formally specify the syntax and semantics of DCM's declarative language, C-SQL, a SQL variant used to express constraint optimization problems. We showcase how constraints on the desired state of the cluster system can be succinctly represented as C-SQL programs, and how query optimization techniques like incremental view maintenance and predicate pushdown can enhance the execution of C-SQL programs. We evaluate the effectiveness of our optimizations through a case study of building Kubernetes schedulers using C-SQL. Our optimizations demonstrated an almost 3000× speed up in database latency and reduced the size of optimization problems by as much as 1/300 of the original, without affecting the quality of the scheduling solutions.

[1]  M. Budiu,et al.  Full-stack SDN , 2022, HotNets.

[2]  Pol Mauri Ruiz,et al.  Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications , 2021, SOSP.

[3]  T. Moscibroda,et al.  Protean: VM Allocation Service at Scale , 2020, OSDI.

[4]  Mor Harchol-Balter,et al.  Borg: the next generation , 2020, EuroSys.

[5]  Wolfgang Lehner,et al.  General dynamic Yannakakis: conjunctive queries with theta joins under updates , 2019, The VLDB Journal.

[6]  Vasiliki Kalavri,et al.  DeltaPath: dataflow-based high-performance incremental routing , 2018, ArXiv.

[7]  Aakanksha Chowdhery,et al.  Accelerating Machine Learning Inference with Probabilistic Predicates , 2018, SIGMOD Conference.

[8]  Peter R. Pietzuch,et al.  Medea: scheduling of long running applications in shared production clusters , 2018, EuroSys.

[9]  Jun Yang,et al.  Optimizing Iceberg Queries with Complex Joins , 2017, SIGMOD Conference.

[10]  Matei Zaharia,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[11]  Robert N. M. Watson,et al.  Firmament: Fast, Centralized Cluster Scheduling at Scale , 2016, OSDI.

[12]  Aditya Akella,et al.  Altruistic Scheduling in Multi-Resource Clusters , 2016, OSDI.

[13]  Srikanth Kandula,et al.  This Paper Is Included in the Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (osdi '16). Graphene: Packing and Dependency-aware Scheduling for Data-parallel Clusters G: Packing and Dependency-aware Scheduling for Data-parallel Clusters , 2022 .

[14]  Mor Harchol-Balter,et al.  TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters , 2016, EuroSys.

[15]  Brighten Godfrey,et al.  Ravel: A Database-Defined Network , 2016, SOSR.

[16]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[17]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[18]  Bruce S. Davie,et al.  The Open vSwitch Database Management Protocol , 2013, RFC.

[19]  Gregory R. Ganger,et al.  alsched: algebraic scheduling of mixed workloads in heterogeneous clouds , 2012, SoCC '12.

[20]  Saikat Guha,et al.  Generalized resource allocation for the cloud , 2012, SoCC '12.

[21]  Lia Purpura On Tools , 2012 .

[22]  Astrid Rheinländer,et al.  Opening the Black Boxes in Data Flow Optimization , 2012, Proc. VLDB Endow..

[23]  Amir Shaikhha,et al.  DBToaster: higher-order delta processing for dynamic, frequently fresh views , 2012, The VLDB Journal.

[24]  Arjun Singh,et al.  A practical algorithm for balancing the max-min fairness and throughput objectives in traffic engineering , 2012, 2012 Proceedings IEEE INFOCOM.

[25]  Joseph M. Hellerstein,et al.  Boom analytics: exploring data-centric, declarative programming for the cloud , 2010, EuroSys '10.

[26]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[27]  Xavier Lorca,et al.  Entropy: a consolidation manager for clusters , 2009, VEE '09.

[28]  Andrea C. Arpaci-Dusseau,et al.  SQCK: A Declarative File System Checker , 2008, OSDI.

[29]  Hicham G. Elmongui,et al.  Lazy Maintenance of Materialized Views , 2007, VLDB.

[30]  Ion Stoica,et al.  Implementing declarative overlays , 2005, SOSP '05.

[31]  Jennifer Widom,et al.  Operator placement for in-network stream query processing , 2005, PODS.

[32]  Michel Scholl,et al.  Building a constraint-based spatial database system: model, languages, and implementation , 2003, Inf. Syst..

[33]  Krithi Ramamritham,et al.  Materialized view selection and maintenance using multi-query optimization , 2000, SIGMOD '01.

[34]  Ralf Hartmut Güting,et al.  Spatio-Temporal Data Types: An Approach to Modeling and Querying Moving Objects in Databases , 1999, GeoInformatica.

[35]  Surajit Chaudhuri,et al.  Optimization of queries with user-defined predicates , 1996, TODS.

[36]  Hamid Pirahesh,et al.  Cost-based optimization for magic: algebra and implementation , 1996, SIGMOD '96.

[37]  Latha S. Colby,et al.  Algorithms for deferred view maintenance , 1996, SIGMOD '96.

[38]  Jennifer Widom,et al.  View maintenance in a warehousing environment , 1995, SIGMOD '95.

[39]  Leonid Libkin,et al.  Incremental maintenance of views with duplicates , 1995, SIGMOD '95.

[40]  Jan Chomicki,et al.  Datalog with Integer Periodicity Constraints , 1994, J. Log. Program..

[41]  Alon Y. Halevy,et al.  Query Optimization by Predicate Move-Around , 1994, VLDB.

[42]  Hamid Pirahesh,et al.  Implementation of magic-sets in a relational database system , 1994, SIGMOD '94.

[43]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[44]  Michael Stonebraker,et al.  Predicate migration: optimizing queries with expensive predicates , 1992, SIGMOD Conference.

[45]  Gabriel M. Kuper,et al.  Constraint query languages (preliminary report) , 1990, PODS '90.

[46]  Georg Gottlob,et al.  Translating SQL Into Relational Algebra: Optimization, Semantics, and Equivalence of SQL Queries , 1985, IEEE Transactions on Software Engineering.

[47]  M. Budiu,et al.  Scaling a Declarative Cluster Manager Architecture with Query Optimization Techniques (Technical Report) , 2022 .

[48]  Michael Stonebraker,et al.  DBOS: A DBMS-oriented Operating System , 2021, Proc. VLDB Endow..

[49]  Philippe Martin,et al.  Kubernetes , 2021 .

[50]  Leonid Ryzhyk,et al.  Building Scalable and Flexible Cluster Managers Using Declarative Programming , 2020, OSDI.

[51]  Sachin Kulkarni,et al.  Twine: A Unified Cluster Management System for Shared Infrastructure , 2020, OSDI.

[52]  Leonid Ryzhyk,et al.  Differential Datalog , 2019, Datalog.

[53]  Michael Isard,et al.  Differential Dataflow , 2013, CIDR.

[54]  Floris Geerts,et al.  Constraint Query Languages , 2008, Encyclopedia of GIS.

[55]  Toby Walsh,et al.  Handbook of Constraint Programming , 2006, Handbook of Constraint Programming.

[56]  知秋 Microsoft:微软“变脸” , 2006 .

[57]  Eric Simon,et al.  Review - Predicate Migration: Optimizing Queries with Expensive Predicates , 2000, ACM SIGMOD Digit. Rev..

[58]  Stéphane Grumbach,et al.  Constraint Databases , 1999, JFPLC.

[59]  G. Graefe The Cascades Framework for Query Optimization , 1995, IEEE Data Eng. Bull..

[60]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[61]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[62]  Ajay Gulati VMware distributed resource Management : design , Implementation , and lessons learned , 2022 .