Optimizing Throughput Performance in Distributed MIMO Wi-Fi Networks Using Deep Reinforcement Learning

This paper explores the feasibility of leveraging deep reinforcement learning (DRL) to enable dynamic resource management in Wi-Fi networks implementing distributed multi-user MIMO (D-MIMO). D-MIMO is a technique by which a set of wireless access points are synchronized and grouped together to jointly serve multiple users simultaneously. This paper addresses two dynamic resource management problems germane to D-MIMO Wi-Fi networks: (i) channel assignment of D-MIMO groups, and (ii) deciding how to cluster access points to form D-MIMO groups, in order to maximize user throughput performance. These problems are known to be NP-Hard and only heuristic solutions exist in literature. We construct a DRL framework through which a learning agent interacts with a D-MIMO Wi-Fi network, learns about the network environment, and successfully converges to policies which address the aforementioned problems. Through extensive simulations and on-line training based on D-MIMO Wi-Fi networks, this paper demonstrates the efficacy of DRL agents in achieving an improvement of 20% in user throughput performance compared to heuristic solutions, particularly when network conditions are dynamic. This work also showcases the effectiveness of DRL agents in meeting multiple network objectives simultaneously, for instance, maximizing throughput of users as well as fairness of throughput among them.

[1]  Joseph A. Paradiso,et al.  The gesture recognition toolkit , 2014, J. Mach. Learn. Res..

[2]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[3]  Olivier Buffet,et al.  Policy‐Gradient Algorithms , 2013 .

[4]  Ying-Chang Liang,et al.  Applications of Deep Reinforcement Learning in Communications and Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[5]  Ann Nowé,et al.  Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[6]  Xianfu Chen,et al.  Deep Reinforcement Learning for Resource Management in Network Slicing , 2018, IEEE Access.

[7]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[8]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Mustafa Cenk Gursoy,et al.  A deep reinforcement learning-based framework for content caching , 2017, 2018 52nd Annual Conference on Information Sciences and Systems (CISS).

[10]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[11]  Alun D. Preece,et al.  Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[12]  Xiao Zhang,et al.  Autonomous navigation of UAV in large-scale unknown complex environment with deep reinforcement learning , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Dongming Wang,et al.  An improved dynamic clustering algorithm for multi-user distributed antenna system , 2009, 2009 International Conference on Wireless Communications & Signal Processing.

[15]  Jeffrey G. Andrews,et al.  Networked MIMO with clustered linear precoding , 2008, IEEE Transactions on Wireless Communications.

[16]  Soung Chang Liew,et al.  Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks , 2017, 2018 IEEE International Conference on Communications (ICC).

[17]  Nan Zhao,et al.  Integrated Networking, Caching, and Computing for Connected Vehicles: A Deep Reinforcement Learning Approach , 2018, IEEE Transactions on Vehicular Technology.

[18]  Walid Saad,et al.  Interference Management for Cellular-Connected UAVs: A Deep Reinforcement Learning Approach , 2018, IEEE Transactions on Wireless Communications.

[19]  Giuseppe Caire,et al.  Achieving high data rates in a distributed MIMO system , 2012, Mobicom '12.

[20]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[21]  Armin Dekorsy,et al.  Self-Organizing Adaptive Clustering for Cooperative Multipoint Transmission , 2011, 2011 IEEE 73rd Vehicular Technology Conference (VTC Spring).

[22]  David Gesbert,et al.  A Dynamic Clustering Approach in Wireless Networks with Multi-Cell Cooperative Processing , 2008, 2008 IEEE International Conference on Communications.

[23]  David López-Pérez,et al.  IEEE 802.11be Extremely High Throughput: The Next Generation of Wi-Fi Technology Beyond 802.11ax , 2019, IEEE Communications Magazine.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Ekram Hossain,et al.  Channel assignment schemes for infrastructure-based 802.11 WLANs: A survey , 2010, IEEE Communications Surveys & Tutorials.

[26]  Ezzeldin Hamed,et al.  Chorus: truly distributed distributed-MIMO , 2018, SIGCOMM.

[27]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[28]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[29]  Ming-Syan Chen,et al.  User-Centric Network MIMO With Dynamic Clustering , 2017, IEEE/ACM Transactions on Networking.

[30]  Ivan Seskar,et al.  D-MIMOO – Distributed MIMO for Office Wi-Fi Networks , 2018, 2018 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN).

[31]  Dina Katabi,et al.  Real-time Distributed MIMO Systems , 2016, SIGCOMM.

[32]  William A. Arbaugh,et al.  Weighted coloring based channel assignment for WLANs , 2005, MOCO.

[33]  Huaiyu Dai,et al.  Some Analysis in Distributed MIMO Systems , 2007, J. Commun..

[34]  Chi Harold Liu,et al.  Experience-driven Networking: A Deep Reinforcement Learning based Approach , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[35]  Peter Stone,et al.  Reinforcement learning from human reward: Discounting in episodic tasks , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[36]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[37]  R. Jain Throughput fairness index : An explanation , 1999 .

[38]  Dusit Niyato,et al.  Deep Reinforcement Learning for Mobile 5G and Beyond: Fundamentals, Applications, and Challenges , 2019, IEEE Vehicular Technology Magazine.

[39]  Ann Nowé,et al.  Scalarized multi-objective reinforcement learning: Novel design techniques , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[40]  Xianfu Chen,et al.  Deep Reinforcement Learning for Network Slicing , 2018, ArXiv.

[41]  Kang G. Shin,et al.  NEMOx: scalable network MIMO for wireless networks , 2013, MobiCom.

[42]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[43]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[44]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.