A Deep-Reinforcement-Learning-Based Approach to Dynamic eMBB/URLLC Multiplexing in 5G NR

This article investigates the dynamic multiplexing of enhanced mobile broadband (eMBB) and ultrareliable and low latency communications (URLLC) on the same channel in 5G NR. Due to significant difference in transmission time scale, URLLC employs a preemptive puncturing technique to multiplex its traffic onto eMBB traffic for transmission. The optimization problem to solve is to minimize the adverse impact of such preemptive puncturing on eMBB users. We present DEMUX—a model-free deep reinforcement learning (DRL)-based solution to this problem. The essence of DEMUX is to use deep function approximators (neural networks) to learn an optimal algorithm for determining the preemption solution in each eMBB transmission time interval (TTI). Our novel contributions in the design of DEMUX include the first use of the DRL method with a large and continuous action domain for resource scheduling in NR, a mechanism to ensure fast and stable learning convergence by exploiting the intrinsic properties of the problem, and a mechanism to obtain a feasible preemption solution from the unconstrained output of a neural network while minimizing loss of information. The experimental results show that DEMUX significantly outperforms state-of-the-art algorithms proposed in the 3GPP standards body and the literature.

[1]  Xiaofeng Tao,et al.  Machine Learning Based Flexible Transmission Time Interval Scheduling for eMBB and uRLLC Coexistence Scenario , 2019, IEEE Access.

[2]  Yingbin Liang,et al.  Estimation of KL Divergence: Optimal Minimax Rate , 2016, IEEE Transactions on Information Theory.

[3]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[4]  Yonghui Song,et al.  A New Deep-Q-Learning-Based Transmission Scheduling Mechanism for the Cognitive Internet of Things , 2018, IEEE Internet of Things Journal.

[5]  Giuseppe Piro,et al.  Downlink Packet Scheduling in LTE Cellular Networks: Key Design Issues and a Survey , 2013, IEEE Communications Surveys & Tutorials.

[6]  Wanshi Chen,et al.  5G ultra-reliable and low-latency systems design , 2017, 2017 European Conference on Networks and Communications (EuCNC).

[7]  Shugong Xu,et al.  Downlink MIMO with Frequency-Domain Packet Scheduling for 3GPP LTE , 2009, IEEE INFOCOM 2009.

[8]  Kobi Cohen,et al.  Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access , 2017, IEEE Transactions on Wireless Communications.

[9]  Nan Zhao,et al.  Integrated Networking, Caching, and Computing for Connected Vehicles: A Deep Reinforcement Learning Approach , 2018, IEEE Transactions on Vehicular Technology.

[10]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[11]  Angelika Bayer,et al.  A First Course In Probability , 2016 .

[12]  Song Guo,et al.  Distributed Segment-Based Anomaly Detection With Kullback–Leibler Divergence in Wireless Sensor Networks , 2017, IEEE Transactions on Information Forensics and Security.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Shrinivas Kudekar,et al.  Design of Low-Density Parity Check Codes for 5G New Radio , 2018, IEEE Communications Magazine.

[15]  Alexander L. Stolyar,et al.  On the Asymptotic Optimality of the Gradient Scheduling Algorithm for Multiuser Throughput Allocation , 2005, Oper. Res..

[16]  Choong Seon Hong,et al.  A matching based coexistence mechanism between eMBB and uRLLC in 5G wireless networks , 2019, SAC.

[17]  Yiwei Thomas Hou,et al.  GPF: A GPU-based Design to Achieve ~100 μs Scheduling for 5G NR , 2018, MobiCom.

[18]  Gustavo de Veciana,et al.  Joint Scheduling of URLLC and eMBB Traffic in 5G Wireless Networks , 2017, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[19]  Pavel S. Rybin,et al.  On the error-correcting capabilities of low-complexity decoded irregular LDPC codes , 2014, 2014 IEEE International Symposium on Information Theory.

[20]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[21]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.