Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP

Resource allocation problems in many computer systems can be formulated as mathematical optimization problems. However, finding exact solutions to these problems using off-the-shelf solvers is often intractable for large problem sizes with tight SLAs, leading system designers to rely on cheap, heuristic algorithms. We observe, however, that many allocation problems are granular: they consist of a large number of clients and resources, each client requests a small fraction of the total number of resources, and clients can interchangeably use different resources. For these problems, we propose an alternative approach that reuses the original optimization problem formulation and leads to better allocations than domain-specific heuristics. Our technique, Partitioned Optimization Problems (POP), randomly splits the problem into smaller problems (with a subset of the clients and resources in the system) and coalesces the resulting sub-allocations into a global allocation for all clients. We provide theoretical and empirical evidence as to why random partitioning works well. In our experiments, POP achieves allocations within 1.5% of the optimal with orders-of-magnitude improvements in runtime compared to existing systems for cluster scheduling, traffic engineering, and load balancing.

[1]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[2]  Leonid Ryzhyk,et al.  Building Scalable and Flexible Cluster Managers Using Declarative Programming , 2020, OSDI.

[3]  Xi Li,et al.  Mayflower: Improving Distributed Filesystem Performance Through SDN/Filesystem Co-Design , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[4]  Alexandra Meliou,et al.  Scalable Package Queries in Relational Database Systems , 2015, Proc. VLDB Endow..

[5]  Carlo Curino,et al.  Workload-aware database monitoring and consolidation , 2011, SIGMOD '11.

[6]  Mor Harchol-Balter,et al.  TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters , 2016, EuroSys.

[7]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[8]  Stephen P. Boyd,et al.  A Splitting Method for Optimal Control , 2013, IEEE Transactions on Control Systems Technology.

[9]  Kang G. Shin,et al.  Tiresias: A GPU Cluster Manager for Distributed Deep Learning , 2019, NSDI.

[10]  Amin Vahdat,et al.  B4 and after: managing hierarchy, partitioning, and asymmetry for availability and scale in google's software-defined WAN , 2018, SIGCOMM.

[11]  Matthew Roughan,et al.  The Internet Topology Zoo , 2011, IEEE Journal on Selected Areas in Communications.

[12]  Yin Tat Lee,et al.  Solving linear programs in the current matrix multiplication time , 2018, STOC.

[13]  Amar Phanishayee,et al.  Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads , 2020, OSDI.

[14]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[15]  Behnaz Arzani,et al.  Contracting Wide-area Network Topologies to Solve Flow Problems Quickly , 2020, NSDI.

[16]  George B. Dantzig,et al.  Decomposition Principle for Linear Programs , 1960 .

[17]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[18]  Robert N. M. Watson,et al.  Firmament: Fast, Centralized Cluster Scheduling at Scale , 2016, OSDI.

[19]  Edith Cohen,et al.  Making intra-domain routing robust to changing and uncertain traffic demands: understanding fundamental tradeoffs , 2003, SIGCOMM '03.

[20]  Yuanlai Liu,et al.  RAS: Continuously Optimized Region-Wide Datacenter Resource Allocation , 2021, SOSP.

[21]  Kwan L. Yeung,et al.  Traffic Engineering in Segment Routing Networks Using MILP , 2020, IEEE Transactions on Network and Service Management.

[22]  I. Litvinchev,et al.  Aggregation in Large-Scale Optimization , 2003 .

[23]  Michael Stonebraker,et al.  E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing , 2014, Proc. VLDB Endow..

[24]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[25]  S. Spraggs,et al.  Traffic Engineering , 2000 .

[26]  Albert G. Greenberg,et al.  Experience in measuring backbone traffic variability: models, metrics, measurements and meaning , 2002, IMW '02.

[27]  Amin Vahdat,et al.  BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing , 2015, Comput. Commun. Rev..

[28]  Lisa Fleischer,et al.  Approximating Fractional Multicommodity Flow Independent of the Number of Commodities , 2000, SIAM J. Discret. Math..

[29]  Wencong Xiao,et al.  Gandiva: Introspective Cluster Scheduling for Deep Learning , 2018, OSDI.

[30]  Stephen Boyd,et al.  A Rewriting System for Convex Optimization Problems , 2017, ArXiv.

[31]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[32]  Ashraf Aboulnaga,et al.  Accordion: Elastic Scalability for Database Systems Supporting Distributed Transactions , 2014, Proc. VLDB Endow..

[33]  Mikkel Thorup,et al.  Traffic engineering with traditional IP routing protocols , 2002, IEEE Commun. Mag..

[34]  Paramvir Bahl,et al.  Low Latency Geo-distributed Data Analytics , 2015, SIGCOMM.

[35]  Yin Tat Lee,et al.  Efficient Inverse Maintenance and Faster Algorithms for Linear Programming , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[36]  Stephen P. Boyd,et al.  Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding , 2013, Journal of Optimization Theory and Applications.

[37]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[38]  Mor Harchol-Balter,et al.  Borg: the next generation , 2020, EuroSys.

[39]  Michel Gendreau,et al.  The Benders decomposition algorithm: A literature review , 2017, Eur. J. Oper. Res..

[40]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[41]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[42]  George Karakostas,et al.  Faster approximation schemes for fractional multicommodity flow problems , 2008, TALG.

[43]  Ajay Gulati VMware distributed resource Management : design , Implementation , and lessons learned , 2022 .

[44]  Deepak Narayanan,et al.  Allocation of fungible resources via a fast, scalable price discovery method , 2021, Mathematical Programming Computation.