Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under Delayed Feedback

Scheduling decisions in parallel queuing systems arise as a fundamental problem, underlying the dimensioning and operation of many computing and communication systems, such as job routing in data center clusters, multipath communication, and Big Data systems. In essence, the scheduler maps each arriving job to one of the possibly heterogeneous servers while aiming at an optimization goal such as load balancing, low average delay or low loss rate. One main difficulty in finding optimal scheduling decisions here is that the scheduler only partially observes the impact of its decisions, e.g., through the delayed acknowledgements of the served jobs. In this paper, we provide a partially observable (PO) model that captures the scheduling decisions in parallel queuing systems under limited information of delayed acknowledgements. We present a simulation model for this PO system to find a near-optimal scheduling policy in real-time using a scalable Monte Carlo tree search algorithm. We numerically show that the resulting policy outperforms other limited information scheduling strategies such as variants of Join-the-Most-Observations and has comparable performance to full information strategies like: Join-the-Shortest-Queue, Join-theShortest-Queue(d) and Shortest-Expected-Delay. Finally, we show how our approach can optimise the real-time parallel processing by using network data provided by Kaggle.

[1]  Eitan Altman,et al.  Closed-loop control with delayed information , 1992, SIGMETRICS '92/PERFORMANCE '92.

[2]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[3]  Ward Whitt,et al.  Deciding Which Queue to Join: Some Counterexamples , 1986, Oper. Res..

[4]  Sem C. Borst,et al.  Scalable load balancing in networked systems: A survey of recent advances , 2018, SIAM Rev..

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Shaler Stidham,et al.  A survey of Markov decision models for control of networks of queues , 1993, Queueing Syst. Theory Appl..

[7]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[8]  Johan van Leeuwaarden,et al.  Steady-state analysis of shortest expected delay routing , 2015, Queueing Systems.

[9]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[10]  John Zahorjan,et al.  Load sharing in heterogeneous queueing systems , 1989, IEEE INFOCOM '89, Proceedings of the Eighth Annual Joint Conference of the IEEE Computer and Communications Societies.

[11]  Sayed Atef Banawan,et al.  A comparative study of load sharing in heterogeneous multicomputer systems , 1992, Annual Simulation Symposium.

[12]  Sheldon M. Ross,et al.  Introduction to probability models , 1975 .

[13]  D. Bertsekas,et al.  Approximate solution methods for partially observable markov and semi-markov decision processes , 2006 .

[14]  Devavrat Shah,et al.  The use of memory in randomized load balancing , 2002, Proceedings IEEE International Symposium on Information Theory,.

[15]  Tapani Lehtonen,et al.  On the optimality of the shortest line discipline , 1984 .

[16]  K. Krishnan,et al.  Joining the right queue: A Markov decision-rule , 1987, 26th IEEE Conference on Decision and Control.

[17]  Ger Koole,et al.  On the Optimality of the Generalized Shortest Queue Policy , 1990, Probability in the Engineering and Informational Sciences.

[18]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[19]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[20]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[21]  Anurag Kumar,et al.  Optimal control of arrivals to queues with delayed queue length information , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[22]  P. Schrimpf,et al.  Dynamic Programming , 2011 .

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Kevin P. Murphy,et al.  A Survey of POMDP Solution Techniques , 2000 .

[25]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[26]  Juan Carlos Corrales,et al.  Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation , 2020, IEEE Access.

[27]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[28]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[29]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[30]  Konstantinos Psounis,et al.  Efficient randomized web-cache replacement schemes using samples from past eviction times , 2002, TNET.

[31]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[32]  Fabio Kon,et al.  A comprehensive view of Hadoop research - A systematic literature review , 2014, J. Netw. Comput. Appl..

[33]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.