Centralized and Distributed Deep Reinforcement Learning Methods for Downlink Sum-Rate Optimization

For a multi-cell, multi-user, cellular network downlink sum-rate maximization through power allocation is a nonconvex and NP-hard optimization problem. In this article, we present an effective approach to solving this problem through single- and multi-agent actor-critic deep reinforcement learning (DRL). Specifically, we use finite-horizon trust region optimization. Through extensive simulations, we show that we can simultaneously achieve higher spectral efficiency than state-of-the-art optimization algorithms like weighted minimum mean-squared error (WMMSE) and fractional programming (FP), while offering execution times more than two orders of magnitude faster than these approaches. Additionally, the proposed trust region methods demonstrate superior performance and convergence properties than the Advantage Actor-Critic (A2C) DRL algorithm. In contrast to prior approaches, the proposed decentralized DRL approaches allow for distributed optimization with limited CSI and controllable information exchange between BSs while offering competitive performance and reduced training times.

[1]  Shie Mannor,et al.  Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning , 2018, NeurIPS.

[2]  Paul de Kerret,et al.  Team Deep Neural Networks for Interference Channels , 2018, 2018 IEEE International Conference on Communications Workshops (ICC Workshops).

[3]  Zhu Han,et al.  User Scheduling and Resource Allocation in HetNets With Hybrid Energy Supply: An Actor-Critic Reinforcement Learning Approach , 2018, IEEE Transactions on Wireless Communications.

[4]  Zhi-Quan Luo,et al.  An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Gerhard Fettweis,et al.  Distributed robust sum rate maximization in cooperative cellular networks , 2013, 2013 IEEE International Conference on Communications Workshops (ICC).

[6]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[7]  Nikos D. Sidiropoulos,et al.  Learning to optimize: Training deep neural networks for wireless resource management , 2017, 2017 IEEE 18th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[8]  Lenan Wu,et al.  Power Allocation in Multi-User Cellular Networks: Deep Reinforcement Learning Approaches , 2019, IEEE Transactions on Wireless Communications.

[9]  Asuman E. Ozdaglar,et al.  Network Games: Theory, Models, and Dynamics , 2011, Network Games: Theory, Models, and Dynamics.

[10]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[11]  Inkyu Lee,et al.  Distributed Beamforming Techniques for Weighted Sum-Rate Maximization in MISO Interference Channels , 2010, IEEE Communications Letters.

[12]  Dongning Guo,et al.  Multi-Agent Deep Reinforcement Learning for Dynamic Power Allocation in Wireless Networks , 2018, IEEE Journal on Selected Areas in Communications.

[13]  Gan Zheng,et al.  Calibrated Learning for Online Distributed Power Allocation in Small-Cell Networks , 2019, IEEE Transactions on Communications.

[14]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[15]  Cong Shen,et al.  Towards Optimal Power Control via Ensembling Deep Neural Networks , 2018, IEEE Transactions on Communications.

[16]  Wei Yu,et al.  Fractional Programming for Communication Systems—Part I: Power Control and Beamforming , 2018, IEEE Transactions on Signal Processing.

[17]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[18]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[19]  Mérouane Debbah,et al.  Uplink Power Control in Cell-Free Massive MIMO via Deep Learning , 2019, 2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[20]  Umer Salim,et al.  Best-response team power control for the interference channel with local CSI , 2015, 2015 IEEE International Conference on Communications (ICC).

[21]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[22]  Zhi-Quan Luo,et al.  Dynamic Spectrum Management: Complexity and Duality , 2008, IEEE Journal of Selected Topics in Signal Processing.

[23]  Wei Yu,et al.  Optimizing Multicell Scheduling and Beamforming via Fractional Programming and Hungarian Algorithm , 2018, 2018 IEEE Globecom Workshops (GC Wkshps).

[24]  N. Sidiropoulos,et al.  Learning to Optimize: Training Deep Neural Networks for Interference Management , 2017, IEEE Transactions on Signal Processing.

[25]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[26]  Arnulf Jentzen,et al.  Solving high-dimensional partial differential equations using deep learning , 2017, Proceedings of the National Academy of Sciences.

[27]  Alejandro Ribeiro,et al.  Learning Optimal Resource Allocations in Wireless Systems , 2018, IEEE Transactions on Signal Processing.

[28]  Symeon Chatzinotas,et al.  Distributed Optimization for Coordinated Beamforming in Multicell Multigroup Multicast Systems: Power Minimization and SINR Balancing , 2017, IEEE Transactions on Signal Processing.

[29]  Eduard A. Jorswieck,et al.  Deep Learning for Real-Time Energy-Efficient Power Control in Mobile Networks , 2019, 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[30]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[31]  Kee Chaing Chua,et al.  Achieving Global Optimality for Weighted Sum-Rate Maximization in the K-User Gaussian Interference Channel with Multiple Antennas , 2012, IEEE Transactions on Wireless Communications.

[32]  W. Marsden I and J , 2012 .

[33]  Doina Precup,et al.  Algorithms for multi-armed bandit problems , 2014, ArXiv.

[34]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[35]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.