Cellular Network Traffic Scheduling With Deep Reinforcement Learning

Modern mobile networks are facing unprecedented growth in demand due to a new class of traffic from Internet of Things (IoT) devices such as smart wearables and autonomous cars. Future networks must schedule delay-tolerant software updates, data backup, and other transfers from IoT devices while maintaining strict service guarantees for conventional realtime applications such as voice-calling and video. This problem is extremely challenging because conventional traffic is highly dynamic across space and time, so its performance is significantly impacted if all IoT traffic is scheduled immediately when it originates. In this paper, we present a reinforcement learning (RL) based scheduler that can dynamically adapt to traffic variation, and to various reward functions set by network operators, to optimally schedule IoT traffic. Using 4 weeks of real network data from downtown Melbourne, Australia spanning diverse traffic patterns, we demonstrate that our RL scheduler can enable mobile networks to carry 14.7% more data with minimal impact on existing traffic, and outperforms heuristic schedulers by more than 2×. Our work is a valuable step towards designing autonomous, “selfdriving” networks that learn to manage themselves from past data.

[1]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[2]  Lassi Sundqvist,et al.  Cellular Controlled Drone Experiment: Evaluation of Network Requirements , 2015 .

[3]  Mr. Jamal Mhawesh Challab Adaptive Opportunistic Routing For Wireless AD HOC Networks , 2016 .

[4]  Ch. Ramesh Babu,et al.  Internet of Vehicles: From Intelligent Grid to Autonomous Cars and Vehicular Clouds , 2016 .

[5]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[6]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[7]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[8]  Christopher D. Gill,et al.  Real-Time Scheduling via Reinforcement Learning , 2010, UAI.

[9]  Tarlochan S. Sidhu,et al.  Opportunities and challenges of wireless communication technologies for smart grid applications , 2010, IEEE PES General Meeting.

[10]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[11]  Nicholas Bambos,et al.  A fuzzy reinforcement learning approach to power control in wireless transmitters , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[13]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[14]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[15]  Michael Kearns,et al.  Reinforcement learning for optimized trade execution , 2006, ICML.

[16]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[17]  Manuela M. Veloso,et al.  Strategy Learning for Autonomous Agents in Smart Grid Markets , 2011, IJCAI.

[18]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[19]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[20]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[21]  Bruno Sinopoli,et al.  A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP , 2015, Comput. Commun. Rev..

[22]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[26]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[27]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[28]  Julie A. Shah,et al.  Decision-making authority, team efficiency and human worker satisfaction in mixed human–robot teams , 2015, Auton. Robots.

[29]  Jie Wang,et al.  Large-scale traffic grid signal control with regional Reinforcement Learning , 2016, 2016 American Control Conference (ACC).

[30]  John N. Tsitsiklis,et al.  Call admission control and routing in integrated services networks using reinforcement learning , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[31]  Marko Bacic,et al.  Model predictive control , 2003 .

[32]  R. Bellman A Markovian Decision Process , 1957 .