Dynamic Bandwidth Allocation Scheme for Wireless Networks with Energy Harvesting Using Actor-Critic Deep Reinforcement Learning

In this paper, we propose an efficient bandwidth allocation scheme in heterogeneous wireless networks with a single macro-cell base station (MBS) and several small-cell base stations (SBSs) that are powered by solar energy harvesters. This paper aims to design an actor-critic deep reinforcement learning (RL) agent at the MBS (i.e. the main controller) with the purpose of maximizing user satisfaction ratio and energy efficiency in the network. The RL agent learns the stochastic arrivals of traffic requests and harvested energy through direct interaction with the network environment and thus can obtain the optimal bandwidth allocation policy in order to enhance network sustainability and performance. For this purpose, we first formulate the bandwidth allocation problem as the framework of a Markov decision process, and then, employ the actor-critic RL algorithm to find the optimal policy for bandwidth allocation. The actor and the critic of the RL agent use deep neural network to approximate the policy function and the value function, respectively. More specifically, the actor generates action based on the output of the policy network while the critic helps the actor evaluate the policy by using the value network. Simulation results are shown to illustrate the performance of the proposed scheme.

[1]  Robert Babuska,et al.  Efficient Model Learning Methods for Actor–Critic Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Vijay K. Bhargava,et al.  Green Cellular Networks: A Survey, Some Research Issues and Challenges , 2011, IEEE Communications Surveys & Tutorials.

[3]  Kang G. Shin,et al.  Analysis of adaptive bandwidth allocation in wireless networks with multilevel degradable quality of service , 2004, IEEE Transactions on Mobile Computing.

[4]  Luc Martens,et al.  Power consumption model for macrocell and microcell base stations , 2014, Trans. Emerg. Telecommun. Technol..

[5]  Nidal Nasser,et al.  Dynamic QoS-Based Bandwidth Allocation Framework for Broadband Wireless Networks , 2011, IEEE Transactions on Vehicular Technology.

[6]  Ibrahim W. Habib,et al.  Adaptive allocation of resources and call admission control for wireless ATM using genetic algorithms , 2000, IEEE Journal on Selected Areas in Communications.

[7]  Jian Su,et al.  Priority-based bandwidth allocation in heterogeneous wireless network , 2015 .

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Jeffrey G. Andrews,et al.  Fundamentals of Heterogeneous Cellular Networks with Energy Harvesting , 2013, IEEE Transactions on Wireless Communications.

[10]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).