Handover Control in Wireless Systems via Asynchronous Multiuser Deep Reinforcement Learning

In this paper, we propose a two-layer framework to learn the optimal handover (HO) controllers in possibly large-scale wireless systems supporting mobile Internet-of-Things users or traditional cellular users, where the user mobility patterns could be heterogeneous. In particular, our proposed framework first partitions the user equipments (UEs) with different mobility patterns into clusters, where the mobility patterns are similar in the same cluster. Then, within each cluster, an asynchronous multiuser deep reinforcement learning (RL) scheme is developed to control the HO processes across the UEs in each cluster, in the goal of lowering the HO rate while ensuring certain system throughput. In this scheme, we use a deep neural network (DNN) as an HO controller learned by each UE via RL in a collaborative fashion. Moreover, we use supervised learning in initializing the DNN controller before the execution of RL to exploit what we already know with traditional HO schemes and to mitigate the negative effects of random exploration at the initial stage. Furthermore, we show that the adopted global-parameter-based asynchronous framework enables us to train faster with more UEs, which could nicely address the scalability issue to support large systems. Finally, simulation results demonstrate that the proposed framework can achieve better performance than the state-of-art online schemes, in terms of HO rates.

[1]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[2]  Jeffrey G. Andrews,et al.  Towards Understanding the Fundamentals of Mobility in Cellular Networks , 2012, IEEE Transactions on Wireless Communications.

[3]  Xiaoyang Wang,et al.  Handover control for LTE femtocell networks , 2011, 2011 International Conference on Electronics, Communications and Control (ICECC).

[4]  Cong Shen,et al.  A Learning Approach to Frequent Handover Mitigations in 3GPP Mobility Protocols , 2017, 2017 IEEE Wireless Communications and Networking Conference (WCNC).

[5]  Mohamed Chtourou,et al.  On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.

[6]  Nicola Bui,et al.  A Survey of Anticipatory Mobile Networking: Context-Based Classification, Prediction Methodologies, and Optimization Techniques , 2016, IEEE Communications Surveys & Tutorials.

[7]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[8]  Kai-Ten Feng,et al.  POMDP-Based Cell Selection Schemes for Wireless Networks , 2014, IEEE Communications Letters.

[9]  Javier Vales-Alonso,et al.  Online Optimization of Interference Coordination Parameters in Small Cell Networks , 2017, IEEE Transactions on Wireless Communications.

[10]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[11]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[12]  Chris Greenhalgh,et al.  Human mobility in shopping mall environments , 2010, MobiOpp '10.

[13]  Andrea Zanella,et al.  Upper Bound Analysis of the Handover Performance in HetNets , 2017, IEEE Communications Letters.

[14]  Jing Liu,et al.  Dynamic fuzzy Q-learning for handover parameters optimization in 5G multi-tier networks , 2015, 2015 International Conference on Wireless Communications & Signal Processing (WCSP).

[15]  Cong Shen,et al.  A Non-Stochastic Learning Approach to Energy Efficient Mobility Management , 2016, IEEE Journal on Selected Areas in Communications.

[16]  Yang Liu,et al.  Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.

[17]  Tracy Camp,et al.  A survey of mobility models for ad hoc network research , 2002, Wirel. Commun. Mob. Comput..

[18]  Mohamed-Slim Alouini,et al.  Handover Management in 5G and Beyond: A Topology Aware Skipping Approach , 2016, IEEE Access.

[19]  Ben Liang,et al.  Stochastic Geometric Analysis of User Mobility in Heterogeneous Wireless Networks , 2015, IEEE Journal on Selected Areas in Communications.

[20]  Yijun Huang,et al.  Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.

[21]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[22]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[23]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[24]  Geoffrey E. Hinton,et al.  Training Recurrent Neural Networks , 2013 .

[25]  Brian L. Mark,et al.  Real-time mobility tracking algorithms for cellular networks based on Kalman filtering , 2005, IEEE Transactions on Mobile Computing.

[26]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[27]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[28]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[29]  Zdenek Becvar,et al.  Adaptive Hysteresis Margin for Handover in Femtocell Networks , 2010, 2010 6th International Conference on Wireless and Mobile Communications.

[30]  Andrea Zanella,et al.  Context-Aware Handover Policies in HetNets , 2016, IEEE Transactions on Wireless Communications.

[31]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.