Model-Free Learning of Optimal Ergodic Policies in Wireless Systems

Learning optimal resource allocation policies in wireless systems can be effectively achieved by formulating finite dimensional constrained programs which depend on system configuration, as well as the adopted learning parameterization. The interest here is in cases where system models are unavailable, prompting methods that probe the wireless system with candidate policies, and then use observed performance to determine better policies. This generic procedure is difficult because of the need to cull accurate gradient estimates out of these limited system queries. This article constructs and exploits smoothed surrogates of constrained ergodic resource allocation problems, the gradients of the former being representable exactly as averages of finite differences that can be obtained through limited system probing. Leveraging this unique property, we develop a new model-free primal-dual algorithm for learning optimal ergodic resource allocations, while we rigorously analyze the relationships between original policy search problems and their surrogates, in both primal and dual domains. First, we show that both primal and dual domain surrogates are uniformly consistent approximations of their corresponding original finite dimensional counterparts. Upon further assuming the use of near-universal policy parameterizations, we also develop explicit bounds on the gap between optimal values of initial, infinite dimensional resource allocation problems, and dual values of their parameterized smoothed surrogates. In fact, we show that this duality gap decreases at a linear rate relative to smoothing and universality parameters. Thus, it can be made arbitrarily small at will, also justifying our proposed primal-dual algorithmic recipe. Numerical simulations confirm the effectiveness of our approach.

[1]  John E. R. Staddon,et al.  The dynamics of behavior: Review of Sutton and Barto: Reinforcement Learning : An Introduction (2 nd ed.) , 2020 .

[2]  Kenneth W. Shum,et al.  Round-robin power control for the weighted sum rate maximisation of wireless networks over multiple interfering links , 2011, Eur. Trans. Telecommun..

[3]  Alejandro Ribeiro,et al.  Learning Optimal Resource Allocations in Wireless Systems , 2018, IEEE Transactions on Signal Processing.

[4]  Tony Q. S. Quek,et al.  Deep Learning for Distributed Optimization: Applications to Wireless Resource Management , 2019, IEEE Journal on Selected Areas in Communications.

[5]  Alejandro Ribeiro,et al.  Optimal resource allocation in wireless communication and networking , 2012, EURASIP Journal on Wireless Communications and Networking.

[6]  Ketan Rajawat,et al.  Network Resource Allocation via Stochastic Subgradient Descent: Convergence Rate , 2017, IEEE Transactions on Communications.

[7]  Emil Björnson,et al.  Sum Spectral Efficiency Maximization in Massive MIMO Systems: Benefits from Deep Learning , 2019, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[8]  Hao Yu,et al.  Dynamic Transmit Covariance Design in MIMO Fading Systems With Unknown Channel Distributions and Inaccurate Channel State Information , 2015, IEEE Transactions on Wireless Communications.

[9]  Ketan Rajawat,et al.  Asynchronous Incremental Stochastic Dual Descent Algorithm for Network Resource Allocation , 2017, IEEE Transactions on Signal Processing.

[10]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[11]  A. Ruszczynski,et al.  Nonlinear Optimization , 2006 .

[12]  Sanjay Shakkottai,et al.  FlashLinQ: A synchronous distributed scheduler for peer-to-peer ad hoc networks , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[13]  Symeon Chatzinotas,et al.  A deep learning approach for optimizing content delivering in cache-enabled HetNet , 2017, 2017 International Symposium on Wireless Communication Systems (ISWCS).

[14]  Alejandro Ribeiro,et al.  Random access design for wireless control systems , 2016, Autom..

[15]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[16]  R. Ash,et al.  Probability and measure theory , 1999 .

[17]  Shugong Xu,et al.  Energy-Efficient Subchannel and Power Allocation for HetNets Based on Convolutional Neural Network , 2019, 2019 IEEE 89th Vehicular Technology Conference (VTC2019-Spring).

[18]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[19]  Wei Yu,et al.  Dual methods for nonconvex spectrum optimization of multicarrier systems , 2006, IEEE Transactions on Communications.

[20]  Jeffrey G. Andrews,et al.  Reinforcement Learning for Self Organization and Power Control of Two-Tier Heterogeneous Networks , 2018, IEEE Transactions on Wireless Communications.

[21]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[22]  Shiqian Ma,et al.  Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities , 2020, ArXiv.

[23]  Alejandro Ribeiro,et al.  Opportunistic Control Over Shared Wireless Channels , 2015, IEEE Transactions on Automatic Control.

[24]  Dongning Guo,et al.  Multi-Agent Deep Reinforcement Learning for Dynamic Power Allocation in Wireless Networks , 2018, IEEE Journal on Selected Areas in Communications.

[25]  Alejandro Ribeiro,et al.  Optimal Wireless Networks Based on Local Channel State Information , 2012, IEEE Transactions on Signal Processing.

[26]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[27]  Zhaoquan Li,et al.  Joint Optimal Subcarrier and Power Allocation for Wireless Cooperative Networks Over OFDM Fading Channels , 2012, IEEE Transactions on Vehicular Technology.

[28]  Georgios B. Giannakis,et al.  Distributed Scheduling and Resource Allocation for Cognitive OFDMA Radios , 2007, 2007 2nd International Conference on Cognitive Radio Oriented Wireless Networks and Communications.

[29]  Jing Wang,et al.  A deep reinforcement learning based framework for power-efficient resource allocation in cloud RANs , 2017, 2017 IEEE International Conference on Communications (ICC).

[30]  Warren B. Powell,et al.  A unified framework for stochastic optimization , 2019, Eur. J. Oper. Res..

[31]  Meryem Simsek,et al.  Resource Management in Wireless Networks via Multi-Agent Deep Reinforcement Learning , 2020, 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[32]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[33]  Alejandro Ribeiro,et al.  Optimal Wireless Resource Allocation With Random Edge Graph Neural Networks , 2019, IEEE Transactions on Signal Processing.

[34]  Lenan Wu,et al.  Power Allocation in Multi-User Cellular Networks: Deep Reinforcement Learning Approaches , 2019, IEEE Transactions on Wireless Communications.

[35]  Sanjay Shakkottai,et al.  FlashLinQ: A Synchronous Distributed Scheduler for Peer-to-Peer Ad Hoc Networks , 2013, IEEE/ACM Transactions on Networking.

[36]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[37]  Xin Wang,et al.  Stochastic Resource Allocation Over Fading Multiple Access and Broadcast Channels , 2009, IEEE Transactions on Information Theory.

[38]  Alejandro Ribeiro,et al.  Adaptive distributed algorithms for optimal random access channels , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[39]  Alejandro Ribeiro,et al.  Learning in Wireless Control Systems Over Nonstationary Channels , 2018, IEEE Transactions on Signal Processing.

[40]  Woongsup Lee,et al.  Deep Power Control: Transmit Power Control Scheme Based on Convolutional Neural Network , 2018, IEEE Communications Letters.

[41]  Zhi-Quan Luo,et al.  An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Wei Cui,et al.  Spatial Deep Learning for Wireless Scheduling , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[43]  Xin Wang,et al.  Resource Allocation for Wireless Multiuser OFDM Networks , 2011, IEEE Transactions on Information Theory.

[44]  Xin Wang,et al.  Optimal chunk-based resource allocation for OFDMA systems with multiple BER requirements , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[45]  D. Zheng,et al.  A Stochastic Primal-Dual Algorithm for Joint Flow Control and MAC Design in Multi-hop Wireless Networks , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[46]  Nikos D. Sidiropoulos,et al.  Transmit beamforming for physical-layer multicasting , 2006, IEEE Transactions on Signal Processing.

[47]  Paul de Kerret,et al.  Team Deep Neural Networks for Interference Channels , 2018, 2018 IEEE International Conference on Communications Workshops (ICC Workshops).

[48]  Navid Naderializadeh,et al.  ITLinQ: A New Approach for Spectrum Sharing in Device-to-Device Communication Systems , 2013, IEEE Journal on Selected Areas in Communications.

[49]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[50]  Cong Shen,et al.  Towards Optimal Power Control via Ensembling Deep Neural Networks , 2018, IEEE Transactions on Communications.

[51]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[52]  N. Sidiropoulos,et al.  Learning to Optimize: Training Deep Neural Networks for Interference Management , 2017, IEEE Transactions on Signal Processing.

[53]  Antonio Pascual-Iserte,et al.  Stochastic resource allocation with a backhaul constraint for the uplink , 2016, 2016 IEEE 27th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC).

[54]  Geoffrey Ye Li,et al.  Deep-Learning-Based Wireless Resource Allocation With Application to Vehicular Networks , 2019, Proceedings of the IEEE.