A Decoupled Learning Strategy for Massive Access Optimization in Cellular IoT Networks

Cellular-based networks are expected to offer connectivity for massive Internet of Things (mIoT) systems. However, their Random Access CHannel (RACH) procedure suffers from unreliability, due to the collision from the simultaneous massive access. Despite that this collision problem has been treated in existing RACH schemes, these schemes usually organize IoT devices' transmission and re-transmission along with fixed parameters, thus can hardly adapt to time-varying traffic patterns. Without adaptation, the RACH procedure easily suffers from high access delay, high energy consumption, or even access unavailability. With the goal of improving the RACH procedure, this paper targets to optimize the RACH procedure in real-time by maximizing a long-term hybrid multi-objective function, which consists of the number of access success devices, the average energy consumption, and the average access delay. To do so, we first optimize the long-term objective in the number of access success devices by using Deep Reinforcement Learning (DRL) algorithms for different RACH schemes, including Access Class Barring (ACB), Back-Off (BO), and Distributed Queuing (DQ). The converging capability and efficiency of different DRL algorithms including Policy Gradient (PG), Actor-Critic (AC), Deep Q-Network (DQN), and Deep Deterministic Policy Gradients (DDPG) are compared. Inspired by the results from this comparison, a decoupled learning strategy is developed to jointly and dynamically adapt the access control factors of those three access schemes. This decoupled strategy first leverage a Recurrent Neural Network (RNN) model to predict the real-time traffic values of the network environment, and then uses multiple DRL agents to cooperatively configure parameters of each RACH scheme.

[1]  Min Chen,et al.  Energy-Delay Evaluation and Optimization for NB-IoT PSM With Periodic Uplink Reporting , 2019, IEEE Access.

[2]  F. Schoute,et al.  Dynamic Frame Length ALOHA , 1983, IEEE Trans. Commun..

[3]  Jesus Alonso-Zarate,et al.  Contention Tree-Based Access for Wireless Machine-to-Machine Networks With Energy Harvesting , 2017, IEEE Transactions on Green Communications and Networking.

[4]  Yunjian Jia,et al.  A novel class-dependent back-off scheme for Machine Type Communication in LTE systems , 2013, 2013 22nd Wireless and Optical Communication Conference.

[5]  Richard J. La,et al.  Fast Adaptive S-ALOHA Scheme for Event-Driven Machine-to-Machine Communications , 2012, 2012 IEEE Vehicular Technology Conference (VTC Fall).

[6]  Hung-Yu Wei,et al.  Estimation and Adaptation for Bursty LTE Random Access , 2016, IEEE Transactions on Vehicular Technology.

[7]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[8]  Arumugam Nallanathan,et al.  Online Supervised Learning for Traffic Load Prediction in Framed-ALOHA Networks , 2019, IEEE Communications Letters.

[9]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[10]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[11]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[12]  Debabrata Das,et al.  Optimization of Barring Factor Enabled Extended Access Barring for Energy Efficiency in LTE-Advanced Base Station , 2018, IEEE Transactions on Green Communications and Networking.

[13]  Petar Popovski,et al.  On the Latency-Energy Performance of NB-IoT Systems in Providing Wide-Area IoT Connectivity , 2020, IEEE Transactions on Green Communications and Networking.

[14]  Arjun K. Gupta,et al.  Handbook of beta distribution and its applications , 2004 .

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Vincent W. S. Wong,et al.  D-ACB: Adaptive Congestion Control Algorithm for Bursty M2M Traffic in LTE Networks , 2016, IEEE Transactions on Vehicular Technology.

[17]  Tony Q. S. Quek,et al.  Analyzing Random Access Collisions in Massive IoT Networks , 2018, IEEE Transactions on Wireless Communications.

[18]  Arumugam Nallanathan,et al.  RACH Preamble Repetition in NB-IoT Network , 2018, IEEE Communications Letters.

[19]  Luis Guijarro,et al.  Performance Study and Enhancement of Access Barring for Massive Machine-Type Communications , 2019, IEEE Access.

[20]  Yansha Deng,et al.  Deep Reinforcement Learning for Discrete and Continuous Massive Access Control optimization , 2020, ICC 2020 - 2020 IEEE International Conference on Communications (ICC).

[21]  Arumugam Nallanathan,et al.  Cooperative Deep Reinforcement Learning for Multiple-group NB-IoT Networks Optimization , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[23]  Konkoly Thege Multi-criteria Reinforcement Learning , 1998 .

[24]  Erik Dahlman,et al.  4G: LTE/LTE-Advanced for Mobile Broadband , 2011 .

[25]  Li Sun,et al.  Traffic‐aware overload control scheme in 5G ultra‐dense M2M networks , 2017, Transactions on Emerging Telecommunications Technologies.

[26]  Arumugam Nallanathan,et al.  Reinforcement Learning for Real-Time Optimization in NB-IoT Networks , 2019, IEEE Journal on Selected Areas in Communications.

[27]  Luis Guijarro,et al.  Efficient Random Access Channel Evaluation and Load Estimation in LTE-A With Massive MTC , 2019, IEEE Transactions on Vehicular Technology.

[28]  Jorge Martínez-Bauset,et al.  Reinforcement Learning-Based ACB in LTE-A Networks for Handling Massive M2M and H2H Communications , 2018, 2018 IEEE International Conference on Communications (ICC).

[29]  Richard J. Cleary Handbook of Beta Distribution and Its Applications , 2006 .

[30]  Arumugam Nallanathan,et al.  Random Access Analysis for Massive IoT Networks Under a New Spatio-Temporal Model: A Stochastic Geometry Approach , 2017, IEEE Transactions on Communications.