Optimal Service Elasticity in Large-Scale Distributed Systems

A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying demand patterns. Auto-scaling provides a popular paradigm for automatically adjusting service capacity in response to demand while meeting performance targets, and queue-driven auto-scaling techniques have been widely investigated in the literature. In typical data center architectures and cloud environments however, no centralized queue is maintained, and load balancing algorithms immediately distribute incoming tasks among parallel queues. In these distributed settings with vast numbers of servers, centralized queue-driven auto-scaling techniques involve a substantial communication overhead and major implementation burden, or may not even be viable at all. Motivated by the above issues, we propose a joint auto-scaling and load balancing scheme which does not require any global queue length information or explicit knowledge of system parameters, and yet provides provably near-optimal service elasticity. We establish the fluid-level dynamics for the proposed scheme in a regime where the total traffic volume and nominal service capacity grow large in proportion. The fluid-limit results show that the proposed scheme achieves asymptotic optimality in terms of user-perceived delay performance as well as energy consumption. Specifically, we prove that both the waiting time of tasks and the relative energy portion consumed by idle servers vanish in the limit. At the same time, the proposed scheme operates in a distributed fashion and involves only constant communication overhead per task, thus ensuring scalability in massive data center operations. Extensive simulation experiments corroborate the fluid-limit results, and demonstrate that the proposed scheme can match the user performance and energy consumption of state-of-the-art approaches that do take full advantage of a centralized queue.

[1]  Ulas C. Kozat,et al.  Dynamic resource allocation and power management in virtualized data centers , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[2]  Jean-Yves Le Boudec,et al.  A class of mean field interaction models for computer and communication systems , 2008, Perform. Evaluation.

[3]  Alexander L. Stolyar,et al.  A Service System with Randomly Behaving On-demand Agents , 2016, SIGMETRICS.

[4]  Mark Burgess,et al.  Dynamic pull-based load balancing for autonomic servers , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.

[5]  James R. Larus,et al.  Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services , 2011, Perform. Evaluation.

[6]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[7]  R. Weber,et al.  Optimal control of service rates in networks of queues , 1987, Advances in Applied Probability.

[8]  S. Ethier,et al.  Markov Processes: Characterization and Convergence , 2005 .

[9]  Lachlan L. H. Andrew,et al.  Geographical load balancing with renewables , 2011, PERV.

[10]  Adam Wierman,et al.  Renewable and cooling aware workload management for sustainable data centers , 2012, SIGMETRICS '12.

[11]  Ward Whitt,et al.  A Fluid Limit for an Overloaded X Model via a Stochastic Averaging Principle , 2010, Math. Oper. Res..

[12]  Kirk Pruhs,et al.  Speed scaling for weighted flow time , 2007, SODA '07.

[13]  Thomas G. Kurtz,et al.  Averaging for martingale problems and stochastic approximation , 1992 .

[14]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[15]  T. Kurtz,et al.  Large loss networks , 1994 .

[16]  James R. Bradley Optimal control of a dual service rate M/M/1 production-inventory model , 2005, Eur. J. Oper. Res..

[17]  Alan Scheller-Wolf,et al.  Exact analysis of the M/M/k/setup class of Markov chains via recursive renewal reward , 2013, SIGMETRICS '13.

[18]  Luca Bortolussi,et al.  Hybrid behaviour of Markov population models , 2012, Inf. Comput..

[19]  A. Wierman,et al.  Optimality, fairness, and robustness in speed scaling designs , 2010, SIGMETRICS '10.

[20]  Bruno Gaujal,et al.  Markov chains with discontinuous drifts have differential inclusions limits , 2012 .

[21]  Alʹbert Nikolaevich Shiri︠a︡ev,et al.  Theory of martingales , 1989 .

[22]  Mor Harchol-Balter,et al.  Are sleep states effective in data centers? , 2012, 2012 International Green Computing Conference (IGCC).

[23]  Lachlan L. H. Andrew,et al.  Power-aware speed scaling in processor sharing systems: Optimality and robustness , 2012, Perform. Evaluation.

[24]  Luca Bortolussi,et al.  Mean-Field Limits Beyond Ordinary Differential Equations , 2016, SFM.

[25]  John N. Tsitsiklis,et al.  On the power of (even a little) centralization in distributed processing , 2011, SIGMETRICS '11.

[26]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[27]  Alexander L. Stolyar,et al.  A service system with on-demand agent invitations , 2014, Queueing Syst. Theory Appl..

[28]  W. Whitt,et al.  Martingale proofs of many-server heavy-traffic limits for Markovian queues ∗ , 2007, 0712.4211.

[29]  Alexander L. Stolyar Pull-based load distribution among heterogeneous parallel servers: the case of multiple routers , 2017, Queueing Syst. Theory Appl..

[30]  T. B. Crabill Optimal Control of a Service Facility with Variable Exponential Service Times and Constant Arrival Rate , 1972 .

[31]  Bruno Gaujal,et al.  Mean field limit of non-smooth systems and differential inclusions , 2010, PERV.

[32]  Sem C. Borst,et al.  Universality of load balancing schemes on the diffusion scale , 2016, J. Appl. Probab..

[33]  Alexander L. Stolyar Pull-based load distribution in large-scale heterogeneous service systems , 2015, Queueing Syst. Theory Appl..

[34]  Lachlan L. H. Andrew,et al.  Greening geographical load balancing , 2011, PERV.

[35]  M. Reiman,et al.  The multiclass GI/PH/N queue in the Halfin-Whitt regime , 2000, Advances in Applied Probability.

[36]  Tuan Phung-Duc,et al.  A Law of Large Numbers for M/M/c/Delayoff-Setup Queues with Nonstationary Arrivals , 2016, ASMTA.

[37]  Lachlan L. H. Andrew,et al.  Online algorithms for geographical load balancing , 2012, 2012 International Green Computing Conference (IGCC).