A multi-armed bandit solver method for adaptive power allocation in device-to-device communication

Abstract Device to device (D2D) communication has attracted enormous attention for future cellular networks which helps to increase the cellular capacity, improve the user throughput, and extend the battery lifetime of user equipments (UEs) by reusing the spectrum resources. However, D2D devices provide interferences in the system while reusing the resources. Proper control of interferences helps to increase the performance of the overall system. Adaptive power allocation among cellular and D2D users contributes to providing an efficient interference management system. In this paper, we propose an online power allocation method, i.e., multi-armed bandit solver for D2D communication. We explore the proposed method to improve the system throughput and D2D throughput as well. We incorporate the set of states for this learning algorithm with the appropriate number of system-defined variables, which increases the observation space and consequently improve the balance of spectrum usage. Finally, we compare our proposed work with existing distributed reinforcement learning and random allocation of resources. Simulation results depict that the proposed resource allocation method outperforms the existing works regarding overall system throughput as well as D2D throughput by efficiently controlling the interference levels.

[1]  Khaled Ben Letaief,et al.  Optimal QoS-Aware Channel Assignment in D2D Communications With Partial CSI , 2016, IEEE Transactions on Wireless Communications.

[2]  Xin Zhou,et al.  Dynamic resource allocations based on Q-learning for D2D communication in cellular networks , 2014, 2014 11th International Computer Conference on Wavelet Actiev Media Technology and Information Processing(ICCWAMTIP).

[3]  Mehdi Rasti,et al.  An adaptive resource allocation scheme for device-to-device communication underlaying cellular networks , 2015, 2015 IEEE/CIC International Conference on Communications in China (ICCC).

[4]  W. Marsden I and J , 2012 .

[5]  Carl Wijting,et al.  Device-to-device communication as an underlay to LTE-advanced networks , 2009, IEEE Communications Magazine.

[6]  Neil Genzlinger A. and Q , 2006 .

[7]  Jianhua Lu,et al.  A QoS-Aware Power Optimization Scheme in OFDMA Systems with Integrated Device-to-Device (D2D) Communications , 2011, 2011 IEEE Vehicular Technology Conference (VTC Fall).

[8]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[9]  Lin Zhang,et al.  Q-learning based power control algorithm for D2D communication , 2016, 2016 IEEE 27th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC).

[10]  Muhidul Islam Khan Resource-aware task scheduling by an adversarial bandit solver method in wireless sensor networks , 2016, EURASIP J. Wirel. Commun. Netw..

[11]  Fortunato Santucci,et al.  A general correlation model for shadow fading in mobile radio systems , 2002, IEEE Communications Letters.