Self-Learning Threshold-Based Load Balancing

We consider a large-scale service system where incoming tasks have to be instantaneously dispatched to one out of many parallel server pools. The user-perceived performance degrades with the number of concurrent tasks and the dispatcher aims at maximizing the overall quality of service by balancing the load through a simple threshold policy. We demonstrate that such a policy is optimal on the fluid and diffusion scales, while only involving a small communication overhead, which is crucial for large-scale deployments. In order to set the threshold optimally, it is important, however, to learn the load of the system, which may be unknown. For that purpose, we design a control rule for tuning the threshold in an online manner. We derive conditions that guarantee that this adaptive threshold settles at the optimal value, along with estimates for the time until this happens. In addition, we provide numerical experiments that support the theoretical results and further indicate that our policy copes effectively with time-varying demand patterns. Summary of Contribution: Data centers and cloud computing platforms are the digital factories of the world, and managing resources and workloads in these systems involves operations research challenges of an unprecedented scale. Due to the massive size, complex dynamics, and wide range of time scales, the design and implementation of optimal resource-allocation strategies is prohibitively demanding from a computation and communication perspective. These resource-allocation strategies are essential for certain interactive applications, for which the available computing resources need to be distributed optimally among users in order to provide the best overall experienced performance. This is the subject of the present article, which considers the problem of distributing tasks among the various server pools of a large-scale service system, with the objective of optimizing the overall quality of service provided to users. A solution to this load-balancing problem cannot rely on maintaining complete state information at the gateway of the system, since this is computationally unfeasible, due to the magnitude and complexity of modern data centers and cloud computing platforms. Therefore, we examine a computationally light load-balancing algorithm that is yet asymptotically optimal in a regime where the size of the system approaches infinity. The analysis is based on a Markovian stochastic model, which is studied through fluid and diffusion limits in the aforementioned large-scale regime. The article analyzes the load-balancing algorithm theoretically and provides numerical experiments that support and extend the theoretical results.

[1]  Alexander L. Stolyar Pull-based load distribution in large-scale heterogeneous service systems , 2015, Queueing Syst. Theory Appl..

[2]  Anthony Ephremides,et al.  A simple dynamic routing problem , 1980 .

[3]  TanJian,et al.  Heavy-traffic Delay Optimality in Pull-based Load Balancing Systems , 2018 .

[4]  James R. Larus,et al.  Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services , 2011, Perform. Evaluation.

[5]  Benny Van Houdt,et al.  Mean Field Analysis of Join-Below-Threshold Load Balancing for Resource Sharing Servers , 2020, SIGMETRICS.

[6]  Fernando Paganini,et al.  Controlling the number of active instances in a cloud environment , 2018, PERV.

[7]  Sem C. Borst,et al.  Universality of load balancing schemes on the diffusion scale , 2016, J. Appl. Probab..

[8]  Fernando Paganini,et al.  Feedback control of server instances for right sizing in the cloud , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[9]  John N. Tsitsiklis,et al.  On the Power of (Even a Little) Resource Pooling , 2012 .

[10]  Lachlan L. H. Andrew,et al.  Power-aware speed scaling in processor sharing systems: Optimality and robustness , 2012, Perform. Evaluation.

[11]  Alexander L. Stolyar,et al.  Join-Idle-Queue with Service Elasticity: Large-Scale Asymptotics of a Non-monotone System , 2018, ArXiv.

[12]  Christos G. Cassandras,et al.  Extremal properties of the shortest/longest non-full queue policies in finite-capacity systems with state-dependent service rates , 1993, Journal of Applied Probability.

[13]  Miklós Telek,et al.  Response Time Distribution of a Class of Limited Processor Sharing Queues , 2018, PERV.

[14]  Maury Bramson,et al.  State space collapse with application to heavy traffic limits for multiclass queueing networks , 1998, Queueing Syst. Theory Appl..

[15]  Mark Burgess,et al.  Dynamic pull-based load balancing for autonomic servers , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.

[16]  Benny Van Houdt,et al.  Mean Field Analysis of Join-Below-Threshold Load Balancing for Resource Sharing Servers , 2019, POMACS.

[17]  Fabrice Guillemin,et al.  The Power of Randomized Routing in Heterogeneous Loss Systems , 2015, 2015 27th International Teletraffic Congress.

[18]  R. L. Dobrushin,et al.  Queueing system with selection of the shortest of two queues: an assymptotic approach , 1996 .

[19]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[20]  P. Billingsley,et al.  Convergence of Probability Measures , 1970, The Mathematical Gazette.

[21]  Wayne L. Winston OPTIMALITY OF THE SHORTEST LINE DISCIPLINE , 1977 .

[22]  Fabrice Guillemin,et al.  Mean field and propagation of chaos in multi-class heterogeneous loss models , 2015, Perform. Evaluation.

[23]  Ness B. Shroff,et al.  Designing Low-Complexity Heavy-Traffic Delay-Optimal Load Balancing Schemes: Theory to Algorithms , 2017, SIGMETRICS.

[24]  Ness B. Shroff,et al.  Heavy-traffic Delay Optimality in Pull-based Load Balancing Systems: Necessary and Sufficient Conditions , 2019, SIGMETRICS.

[25]  Inneke Van Nieuwenhuyse,et al.  Staffing and scheduling under nonstationary demand for service: A literature review , 2016 .

[26]  Richard F. Serfozo,et al.  Optimality of routing and servicing in dependent parallel processing systems , 1991, Queueing Syst. Theory Appl..

[27]  John N. Tsitsiklis,et al.  Delay, Memory, and Messaging Tradeoffs in Distributed Service Systems , 2018 .

[28]  Céline Comte Dynamic load balancing with tokens , 2019, Comput. Commun..

[29]  Yin Sun,et al.  Designing Low-Complexity Heavy-Traffic Delay-Optimal Load Balancing Schemes: Theory to Algorithms , 2019, PERV.

[30]  David Gamarnik,et al.  Join the Shortest Queue with Many Servers. The Heavy-Traffic Asymptotics , 2015, Math. Oper. Res..

[31]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[32]  Sem C. Borst,et al.  Scalable load balancing in networked systems: A survey of recent advances , 2018, SIAM Rev..

[33]  Sem C. Borst,et al.  Asymptotic Optimality of Power-of-d Load Balancing in Large-Scale Systems , 2016, Math. Oper. Res..

[34]  François Baccelli,et al.  Inverse problems in queueing theory and Internet probing , 2009, Queueing Syst. Theory Appl..

[35]  Balakrishna J. Prabhu,et al.  Asymptotics of Insensitive Load Balancing and Blocking Phases , 2016, SIGMETRICS.

[36]  D. Yao,et al.  Fundamentals of Queueing Networks: Performance, Asymptotics, and Optimization , 2001, IEEE Transactions on Automatic Control.

[37]  S. Turner,et al.  The Effect of Increasing Routing Choice on Resource Pooling , 1998, Probability in the Engineering and Informational Sciences.

[38]  Bruno Gaujal,et al.  Markov chains with discontinuous drifts have differential inclusions limits , 2012 .

[39]  Sara Oueslati,et al.  Quality of service and flow level admission control in the Internet , 2002, Comput. Networks.

[40]  R. Srikant,et al.  Power of d Choices for Large-Scale Bin Packing , 2015, SIGMETRICS.

[41]  Sem C. Borst,et al.  Optimal Service Elasticity in Large-Scale Distributed Systems , 2017, SIGMETRICS.

[42]  Ravi Mazumdar,et al.  Choosing among heterogeneous server clouds , 2016, Queueing Systems.

[43]  Laurent Massoulié,et al.  Fair internet traffic integration: network flow models and analysis , 2004, Ann. des Télécommunications.