Reinforcement Learning for Optimal Control of Queueing Systems

With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are typically unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieving desirable network performance (e.g., high throughput or low average job delay). Online/sequential learning algorithms are well-suited to learning the optimal control policy from observed data for systems without the information of underlying dynamics. In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy of queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Existing RL techniques, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called Piecewise Decaying $\epsilon$-Greedy Reinforcement Learning (PDGRL), which applies model-based RL methods over a finite subset of the state space. We establish that the average queue backlog under PDGRL with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate PDGRL in dynamic server allocation and routing problems. Simulations show that PDGRL minimizes the average queue backlog effectively.

[1]  Delay Stability of Back-Pressure Policies in the Presence of Heavy-Tailed Traffic , 2016, IEEE/ACM Transactions on Networking.

[2]  Prasant Mohapatra,et al.  QRON: QoS-aware routing in overlay networks , 2004, IEEE Journal on Selected Areas in Communications.

[3]  E. Ordentlich,et al.  Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .

[4]  David Watson,et al.  Topology aware overlay networks , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[5]  Sean P. Meyn,et al.  Stability of queueing networks and scheduling policies , 1995, IEEE Trans. Autom. Control..

[6]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[7]  Panganamala Ramana Kumar,et al.  The Delay of Open Markovian Queueing Networks: Uniform Functional Bounds, Heavy Traffic Pole Multiplicities, and Stability , 1997, Math. Oper. Res..

[8]  Leandros Tassiulas,et al.  Dynamic server allocation to parallel queues with randomly varying connectivity , 1993, IEEE Trans. Inf. Theory.

[9]  Erol Gelenbe,et al.  Big Data for Autonomic Intercontinental Overlays , 2016, IEEE Journal on Selected Areas in Communications.

[10]  Jean Walrand,et al.  The c# rule revisited , 1985 .

[11]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[12]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[13]  Eytan Modiano,et al.  Fairness and Optimal Stochastic Control for Heterogeneous Networks , 2005, IEEE/ACM Transactions on Networking.

[14]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[15]  Ramesh K. Sitaraman,et al.  Overlay Networks: An Akamai Perspective , 2014 .

[16]  Eytan Modiano,et al.  A Distributed Algorithm for Throughput Optimal Routing in Overlay Networks , 2016, 2019 IFIP Networking Conference (IFIP Networking).

[17]  Qingkai Liang,et al.  Optimal Network Control in Partially-Controllable Networks , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[18]  Eytan Modiano,et al.  Optimal Transmission Scheduling in Symmetric Communication Models With Intermittent Connectivity , 2007, IEEE Transactions on Information Theory.

[19]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[20]  Leandros Tassiulas,et al.  Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks , 1990, 29th IEEE Conference on Decision and Control.

[21]  Ching-Feng Kuo,et al.  Homogeneous denumerable Markov processes , 1988 .

[22]  Benjamin Van Roy,et al.  (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.

[23]  Sajee Singsanga,et al.  Packet forwarding in overlay wireless sensor networks using NashQ reinforcement learning , 2010, 2010 Sixth International Conference on Intelligent Sensors, Sensor Networks and Information Processing.

[24]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[25]  Kobi Cohen,et al.  Deep Multi-User Reinforcement Learning for Dynamic Spectrum Access in Multichannel Wireless Networks , 2017, GLOBECOM 2017 - 2017 IEEE Global Communications Conference.

[26]  Pierre Brémaud Lyapunov Functions and Martingales , 1999 .

[27]  Bhaskar Krishnamachari,et al.  Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks , 2018, IEEE Transactions on Cognitive Communications and Networking.

[28]  Zheng Wen,et al.  Posterior Sampling for Large Scale Reinforcement Learning , 2017, ArXiv.

[29]  J. Tsitsiklis,et al.  Performance of Multiclass Markovian Queueing Networks Via Piecewise Linear Lyapunov Functions , 2001 .

[30]  Armand M. Makowski,et al.  K competing queues with geometric service requirements and linear costs: The μc-rule is always optimal☆ , 1985 .

[31]  P. R. Kumar,et al.  Performance bounds for queueing networks and scheduling policies , 1994, IEEE Trans. Autom. Control..

[32]  John N. Tsitsiklis,et al.  Optimization of multiclass queuing networks: polyhedral and nonlinear characterizations of achievable performance , 1994 .

[33]  ModianoEytan,et al.  Max-Weight Scheduling in Queueing Networks With Heavy-Tailed Traffic , 2014 .

[34]  J. G. Dai,et al.  Maximum Pressure Policies in Stochastic Processing Networks , 2005, Oper. Res..

[35]  J. Tsitsiklis,et al.  Geometric bounds for stationary distributions of infinite Markov chains via Lyapunov functions , 1998 .

[36]  Longbo Huang,et al.  Dynamic product assembly and inventory control for maximum profit , 2010, 49th IEEE Conference on Decision and Control (CDC).

[37]  Mukaddim Pathan,et al.  Advanced Content Delivery, Streaming, and Cloud Services , 2014 .