Reinforcement Learning for Real-Time Optimization in NB-IoT Networks

NarrowBand Internet of Things (NB-IoT) is an emerging cellular-based technology that offers a range of flexible configurations for massive IoT radio access from groups of devices with heterogeneous requirements. A configuration specifies the amount of radio resource allocated to each group of devices for random access and for data transmission. Assuming no knowledge of the traffic statistics, there exists an important challenge in “how to determine the configuration that maximizes the long-term average number of served IoT devices at each transmission time interval (TTI) in an online fashion.” Given the complexity of searching for optimal configuration, we first develop real-time configuration selection based on the tabular Q-learning (tabular-Q), the linear approximation-based Q-learning (LA-Q), and the deep neural network-based Q-learning (DQN) in the single-parameter single-group scenario. Our results show that the proposed reinforcement learning-based approaches considerably outperform the conventional heuristic approaches based on load estimation (LE-URC) in terms of the number of served IoT devices. This result also indicates that LA-Q and DQN can be good alternatives for tabular-Q to achieve almost the same performance with much less training time. We further advance LA-Q and DQN via actions aggregation (AA-LA-Q and AA-DQN) and via cooperative multi-agent learning (CMA-DQN) for the multi-parameter multi-group scenario, thereby solve the problem that Q-learning agents do not converge in high-dimensional configurations. In this scenario, the superiority of the proposed Q-learning approaches over the conventional LE-URC approach significantly improves with the increase of configuration dimensions, and the CMA-DQN approach outperforms the other approaches in both throughput and training efficiency.

[1]  Yuan Wu,et al.  Uplink Scheduling and Link Adaptation for Narrowband Internet of Things Systems , 2017, IEEE Access.

[2]  Jesus Alonso-Zarate,et al.  Is the Random Access Channel of LTE and LTE-A Suitable for M2M Communications? A Survey of Alternatives , 2014, IEEE Communications Surveys & Tutorials.

[3]  Jorge Martínez-Bauset,et al.  Reinforcement Learning-Based ACB in LTE-A Networks for Handling Massive M2M and H2H Communications , 2018, 2018 IEEE International Conference on Communications (ICC).

[4]  Haris Pervaiz,et al.  Radio Resource Management Scheme in NB-IoT Systems , 2018, IEEE Access.

[5]  Sung-Min Oh,et al.  An Efficient Small Data Transmission Scheme in the 3GPP NB-IoT System , 2017, IEEE Communications Letters.

[6]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[7]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Ashutosh Sabharwal,et al.  An Axiomatic Theory of Fairness in Network Resource Allocation , 2009, 2010 Proceedings IEEE INFOCOM.

[9]  Chung-Ju Chang,et al.  Q-learning-based multirate transmission control scheme for RRM in multimedia WCDMA systems , 2004, IEEE Transactions on Vehicular Technology.

[10]  Jaeho Kim,et al.  M2M Service Platforms: Survey, Issues, and Enabling Technologies , 2014, IEEE Communications Surveys & Tutorials.

[11]  Xingqin Lin,et al.  Random Access Preamble Design and Detection for 3GPP Narrowband IoT Systems , 2016, IEEE Wireless Communications Letters.

[12]  David Grace,et al.  ALOHA and Q-Learning based medium access control for Wireless Sensor Networks , 2012, 2012 International Symposium on Wireless Communication Systems (ISWCS).

[13]  Francisco S. Melo,et al.  Q -Learning with Linear Function Approximation , 2007, COLT.

[14]  Sebastian Thrun,et al.  Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[15]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[16]  Lusheng Ji,et al.  A first look at cellular machine-to-machine traffic: large scale measurement and characterization , 2012, SIGMETRICS '12.

[17]  Vincent W. S. Wong,et al.  D-ACB: Adaptive Congestion Control Algorithm for Bursty M2M Traffic in LTE Networks , 2016, IEEE Transactions on Vehicular Technology.

[18]  Harish Viswanathan,et al.  Wide-area Wireless Communication Challenges for the Internet of Things , 2015, IEEE Communications Magazine.

[19]  David Grace,et al.  Application of Q-Learning for RACH Access to Support M2M Traffic over a Cellular Network , 2014 .

[20]  Erik Dahlman,et al.  4G: LTE/LTE-Advanced for Mobile Broadband , 2011 .

[21]  Arumugam Nallanathan,et al.  Cooperative Deep Reinforcement Learning for Multiple-group NB-IoT Networks Optimization , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Alborz Geramifard,et al.  A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning , 2013, Found. Trends Mach. Learn..

[23]  Tony Q. S. Quek,et al.  Analyzing Random Access Collisions in Massive IoT Networks , 2018, IEEE Transactions on Wireless Communications.

[24]  Jihun Moon,et al.  A Reinforcement Learning Approach to Access Management in Wireless Cellular Networks , 2017, Wirel. Commun. Mob. Comput..

[25]  Guowang Miao,et al.  Latency-Energy Tradeoff Based on Channel Scheduling and Repetitions in NB-IoT Systems , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[26]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[27]  Chen-Khong Tham,et al.  Distributed Reinforcement Learning Frameworks for Cooperative Retransmission in Wireless Networks , 2010, IEEE Transactions on Vehicular Technology.

[28]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[29]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[30]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[31]  Xingqin Lin,et al.  A Primer on 3GPP Narrowband Internet of Things , 2016, IEEE Communications Magazine.

[32]  Arumugam Nallanathan,et al.  RACH Preamble Repetition in NB-IoT Network , 2018, IEEE Communications Letters.

[33]  Xi Cheng,et al.  Polynomial Regression As an Alternative to Neural Nets , 2018, ArXiv.

[34]  Tim Clarke,et al.  Distributed Frame Size Selection for a Q learning based Slotted ALOHA Protocol , 2013, International Symposium on Wireless Communication Systems.

[35]  Riri Fitri Sari,et al.  Optimization of Random Access Channel in NB-IoT , 2018, IEEE Internet of Things Journal.

[36]  Kae Won Choi,et al.  Hybrid Random Access and Data Transmission Protocol for Machine-to-Machine Communications in Cellular Networks , 2015, IEEE Transactions on Wireless Communications.

[37]  Richard J. Cleary Handbook of Beta Distribution and Its Applications , 2006 .