Deep Reinforcement Learning With Discrete Normalized Advantage Functions for Resource Management in Network Slicing

Network slicing promises to provision diversified services with distinct requirements in one infrastructure. Deep reinforcement learning (e.g., deep <inline-formula> <tex-math notation="LaTeX">$\mathcal {Q}$ </tex-math></inline-formula>-learning, DQL) is assumed to be an appropriate algorithm to solve the demand-aware inter-slice resource management issue in network slicing by regarding the varying demands and the allocated bandwidth as the environment <italic>state</italic> and the <italic>action</italic>, respectively. However, allocating bandwidth in a finer resolution usually implies larger action space, and unfortunately DQL fails to quickly converge in this case. In this letter, we introduce discrete normalized advantage functions (DNAF) into DQL, by separating the <inline-formula> <tex-math notation="LaTeX">$\mathcal {Q}$ </tex-math></inline-formula>-value function as a state-value function term and an advantage term and exploiting a deterministic policy gradient descent (DPGD) algorithm to avoid the unnecessary calculation of <inline-formula> <tex-math notation="LaTeX">$\mathcal {Q}$ </tex-math></inline-formula>-value for every state-action pair. Furthermore, as DPGD only works in continuous action space, we embed a k-nearest neighbor algorithm into DQL to quickly find a valid action in the discrete space nearest to the DPGD output. Finally, we verify the faster convergence of the DNAF-based DQL through extensive simulations.

[1]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[2]  Xianfu Chen,et al.  Stochastic Power Adaptation with Multiagent Reinforcement Learning for Cognitive Wireless Mesh Networks , 2013, IEEE Transactions on Mobile Computing.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[5]  Lazaros Gkatzikis,et al.  The Algorithmic Aspects of Network Slicing , 2017, IEEE Communications Magazine.

[6]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[7]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[8]  Xianfu Chen,et al.  Deep Reinforcement Learning for Resource Management in Network Slicing , 2018, IEEE Access.

[9]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[10]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[11]  Xianfu Chen,et al.  TACT: A Transfer Actor-Critic Learning Framework for Energy Saving in Cellular Radio Access Networks , 2012, IEEE Transactions on Wireless Communications.

[12]  Honggang Zhang,et al.  Network slicing as a service: enabling enterprises' own software-defined cellular networks , 2016, IEEE Communications Magazine.

[13]  Gustavo de Veciana,et al.  Elastic Multi-resource Network Slicing: Can Protection Lead to Improved Performance? , 2019, 2019 International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT).

[14]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[15]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[16]  Yan Chen,et al.  Intelligent 5G: When Cellular Networks Meet Artificial Intelligence , 2017, IEEE Wireless Communications.

[17]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.